parallel iterative procedures: Topics by Science.gov

Sample records for parallel iterative procedures

Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1991-01-01

Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1990-01-01

Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi

PubMed Central

Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil

2012-01-01

The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
Exploiting parallel computing with limited program changes using a network of microcomputers

NASA Technical Reports Server (NTRS)

Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

1985-01-01

Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Adaptive implicit-explicit and parallel element-by-element iteration schemes

NASA Technical Reports Server (NTRS)

Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.

1989-01-01

Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
The Battlefield Environment Division Modeling Framework (BMF). Part 1: Optimizing the Atmospheric Boundary Layer Environment Model for Cluster Computing

DTIC Science & Technology

2014-02-01

idle waiting for the wavefront to reach it. To overcome this, Reeve et al. (2001) 3 developed a scheme in analogy to the red-black Gauss - Seidel iterative ...understandable procedure calls. Parallelization of the SIMPLE iterative scheme with SIP used a red-black scheme similar to the red-black Gauss - Seidel ...scheme, the SIMPLE method, for pressure-velocity coupling. The result is a slowing convergence of the outer iterations . The red-black scheme excites a 2
Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

2005-01-01

The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

NASA Astrophysics Data System (ADS)

Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

2017-07-01

Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Development of parallel algorithms for electrical power management in space applications

NASA Technical Reports Server (NTRS)

Berry, Frederick C.

1989-01-01

The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Fast iterative censoring CFAR algorithm for ship detection from SAR images

NASA Astrophysics Data System (ADS)

Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng

2017-11-01

Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.
Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

NASA Astrophysics Data System (ADS)

Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

2017-04-01

Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40 Hz for a model of 2001 by 600 grid points with 5 m grid spacing and 5000 time steps, in less than 2.5 minutes per source. In practice, using 15 nodes (30 GPUs) to model 101 sources, each iteration took approximately 9 minutes. For reference, the same inversion run with our CPU code uses two hours per iteration. This was done using only a very simple wavefield interpolation technique, saving every second timestep. Using a more sophisticated checkpointing or wavefield reconstruction method would allow us to increase this model size significantly. Our results show that ordinary gaming GPUs are a viable alternative to the expensive professional GPUs often used today, when performing large scale modeling and inversion in geophysics.
Aerodynamic optimization by simultaneously updating flow variables and design parameters

NASA Technical Reports Server (NTRS)

Rizk, M. H.

1990-01-01

The application of conventional optimization schemes to aerodynamic design problems leads to inner-outer iterative procedures that are very costly. An alternative approach is presented based on the idea of updating the flow variable iterative solutions and the design parameter iterative solutions simultaneously. Two schemes based on this idea are applied to problems of correcting wind tunnel wall interference and optimizing advanced propeller designs. The first of these schemes is applicable to a limited class of two-design-parameter problems with an equality constraint. It requires the computation of a single flow solution. The second scheme is suitable for application to general aerodynamic problems. It requires the computation of several flow solutions in parallel. In both schemes, the design parameters are updated as the iterative flow solutions evolve. Computations are performed to test the schemes' efficiency, accuracy, and sensitivity to variations in the computational parameters.
On some Aitken-like acceleration of the Schwarz method

NASA Astrophysics Data System (ADS)

Garbey, M.; Tromeur-Dervout, D.

2002-12-01

In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
An iterative method for systems of nonlinear hyperbolic equations

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.

1989-01-01

An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic equations is presented. Parallelism is evident at several levels. In the formation of the iteration, the equations are decoupled, thereby providing large grain parallelism. Parallelism may also be exploited within the solves for each equation. Convergence of the interation is established via a bounding function argument. Experimental results in two-dimensions are presented.
Single-agent parallel window search

NASA Technical Reports Server (NTRS)

Powley, Curt; Korf, Richard E.

1991-01-01

Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.

Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson–Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatterjee, Kausik, E-mail: kausik.chatterjee@aggiemail.usu.edu; Center for Atmospheric and Space Sciences, Utah State University, Logan, UT 84322; Roadcap, John R., E-mail: john.roadcap@us.af.mil

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson–Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals ofmore » the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.« less
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson-Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

NASA Astrophysics Data System (ADS)

Chatterjee, Kausik; Roadcap, John R.; Singh, Surendra

2014-11-01

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson-Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals of the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.
Solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators.

PubMed

Zhao, Jing; Zong, Haili

2018-01-01

In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.
Parallelized implicit propagators for the finite-difference Schrödinger equation

NASA Astrophysics Data System (ADS)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
On iterative processes in the Krylov-Sonneveld subspaces

NASA Astrophysics Data System (ADS)

Ilin, Valery P.

2016-10-01

The iterative Induced Dimension Reduction (IDR) methods are considered for solving large systems of linear algebraic equations (SLAEs) with nonsingular nonsymmetric matrices. These approaches are investigated by many authors and are charachterized sometimes as the alternative to the classical processes of Krylov type. The key moments of the IDR algorithms consist in the construction of the embedded Sonneveld subspaces, which have the decreasing dimensions and use the orthogonalization to some fixed subspace. Other independent approaches for research and optimization of the iterations are based on the augmented and modified Krylov subspaces by using the aggregation and deflation procedures with present various low rank approximations of the original matrices. The goal of this paper is to show, that IDR method in Sonneveld subspaces present an original interpretation of the modified algorithms in the Krylov subspaces. In particular, such description is given for the multi-preconditioned semi-conjugate direction methods which are actual for the parallel algebraic domain decomposition approaches.
Iterative algorithms for large sparse linear systems on parallel computers

NASA Technical Reports Server (NTRS)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Twostep-by-twostep PIRK-type PC methods with continuous output formulas

NASA Astrophysics Data System (ADS)

Cong, Nguyen Huu; Xuan, Le Ngoc

2008-11-01

This paper deals with parallel predictor-corrector (PC) iteration methods based on collocation Runge-Kutta (RK) corrector methods with continuous output formulas for solving nonstiff initial-value problems (IVPs) for systems of first-order differential equations. At nth step, the continuous output formulas are used not only for predicting the stage values in the PC iteration methods but also for calculating the step values at (n+2)th step. In this case, the integration processes can be proceeded twostep-by-twostep. The resulting twostep-by-twostep (TBT) parallel-iterated RK-type (PIRK-type) methods with continuous output formulas (twostep-by-twostep PIRKC methods or TBTPIRKC methods) give us a faster integration process. Fixed stepsize applications of these TBTPIRKC methods to a few widely-used test problems reveal that the new PC methods are much more efficient when compared with the well-known parallel-iterated RK methods (PIRK methods), parallel-iterated RK-type PC methods with continuous output formulas (PIRKC methods) and sequential explicit RK codes DOPRI5 and DOP853 available from the literature.
Development of a stereo analysis algorithm for generating topographic maps using interactive techniques of the MPP

NASA Technical Reports Server (NTRS)

Strong, James P.

1987-01-01

A local area matching algorithm was developed on the Massively Parallel Processor (MPP). It is an iterative technique that first matches coarse or low resolution areas and at each iteration performs matches of higher resolution. Results so far show that when good matches are possible in the two images, the MPP algorithm matches corresponding areas as well as a human observer. To aid in developing this algorithm, a control or shell program was developed for the MPP that allows interactive experimentation with various parameters and procedures to be used in the matching process. (This would not be possible without the high speed of the MPP). With the system, optimal techniques can be developed for different types of matching problems.
On the Development of an Efficient Parallel Hybrid Solver with Application to Acoustically Treated Aero-Engine Nacelles

NASA Technical Reports Server (NTRS)

Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj

2006-01-01

A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
High-speed technique based on a parallel projection correlation procedure for digital image correlation

NASA Astrophysics Data System (ADS)

Zaripov, D. I.; Renfu, Li

2018-05-01

The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Agile parallel bioinformatics workflow management using Pwrake.

PubMed

Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

2011-09-08

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake

PubMed Central

2011-01-01

Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
AZTEC: A parallel iterative package for the solving linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1996-12-31

We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
Iterating between lessons on concepts and procedures can improve mathematics knowledge.

PubMed

Rittle-Johnson, Bethany; Koedinger, Kenneth

2009-09-01

Knowledge of concepts and procedures seems to develop in an iterative fashion, with increases in one type of knowledge leading to increases in the other type of knowledge. This suggests that iterating between lessons on concepts and procedures may improve learning. The purpose of the current study was to evaluate the instructional benefits of an iterative lesson sequence compared to a concepts-before-procedures sequence for students learning decimal place-value concepts and arithmetic procedures. In two classroom experiments, sixth-grade students from two schools participated (N=77 and 26). Students completed six decimal lessons on an intelligent-tutoring systems. In the iterative condition, lessons cycled between concept and procedure lessons. In the concepts-first condition, all concept lessons were presented before introducing the procedure lessons. In both experiments, students in the iterative condition gained more knowledge of arithmetic procedures, including ability to transfer the procedures to problems with novel features. Knowledge of concepts was fairly comparable across conditions. Finally, pre-test knowledge of one type predicted gains in knowledge of the other type across experiments. An iterative sequencing of lessons seems to facilitate learning and transfer, particularly of mathematical procedures. The findings support an iterative perspective for the development of knowledge of concepts and procedures.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Organizing Compression of Hyperspectral Imagery to Allow Efficient Parallel Decompression

NASA Technical Reports Server (NTRS)

Klimesh, Matthew A.; Kiely, Aaron B.

2014-01-01

family of schemes has been devised for organizing the output of an algorithm for predictive data compression of hyperspectral imagery so as to allow efficient parallelization in both the compressor and decompressor. In these schemes, the compressor performs a number of iterations, during each of which a portion of the data is compressed via parallel threads operating on independent portions of the data. The general idea is that for each iteration it is predetermined how much compressed data will be produced from each thread.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.

We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less
Parallelization of a hydrological model using the message passing interface

USGS Publications Warehouse

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Computational aspects of helicopter trim analysis and damping levels from Floquet theory

NASA Technical Reports Server (NTRS)

Gaonkar, Gopal H.; Achar, N. S.

1992-01-01

Helicopter trim settings of periodic initial state and control inputs are investigated for convergence of Newton iteration in computing the settings sequentially and in parallel. The trim analysis uses a shooting method and a weak version of two temporal finite element methods with displacement formulation and with mixed formulation of displacements and momenta. These three methods broadly represent two main approaches of trim analysis: adaptation of initial-value and finite element boundary-value codes to periodic boundary conditions, particularly for unstable and marginally stable systems. In each method, both the sequential and in-parallel schemes are used and the resulting nonlinear algebraic equations are solved by damped Newton iteration with an optimally selected damping parameter. The impact of damped Newton iteration, including earlier-observed divergence problems in trim analysis, is demonstrated by the maximum condition number of the Jacobian matrices of the iterative scheme and by virtual elimination of divergence. The advantages of the in-parallel scheme over the conventional sequential scheme are also demonstrated.
Coarse mesh and one-cell block inversion based diffusion synthetic acceleration

NASA Astrophysics Data System (ADS)

Kim, Kang-Seog

DSA (Diffusion Synthetic Acceleration) has been developed to accelerate the SN transport iteration. We have developed solution techniques for the diffusion equations of FLBLD (Fully Lumped Bilinear Discontinuous), SCB (Simple Comer Balance) and UCB (Upstream Corner Balance) modified 4-step DSA in x-y geometry. Our first multi-level method includes a block Gauss-Seidel iteration for the discontinuous diffusion equation, uses the continuous diffusion equation derived from the asymptotic analysis, and avoids void cell calculation. We implemented this multi-level procedure and performed model problem calculations. The results showed that the FLBLD, SCB and UCB modified 4-step DSA schemes with this multi-level technique are unconditionally stable and rapidly convergent. We suggested a simplified multi-level technique for FLBLD, SCB and UCB modified 4-step DSA. This new procedure does not include iterations on the diffusion calculation or the residual calculation. Fourier analysis results showed that this new procedure was as rapidly convergent as conventional modified 4-step DSA. We developed new DSA procedures coupled with 1-CI (Cell Block Inversion) transport which can be easily parallelized. We showed that 1-CI based DSA schemes preceded by SI (Source Iteration) are efficient and rapidly convergent for LD (Linear Discontinuous) and LLD (Lumped Linear Discontinuous) in slab geometry and for BLD (Bilinear Discontinuous) and FLBLD in x-y geometry. For 1-CI based DSA without SI in slab geometry, the results showed that this procedure is very efficient and effective for all cases. We also showed that 1-CI based DSA in x-y geometry was not effective for thin mesh spacings, but is effective and rapidly convergent for intermediate and thick mesh spacings. We demonstrated that the diffusion equation discretized on a coarse mesh could be employed to accelerate the transport equation. Our results showed that coarse mesh DSA is unconditionally stable and is as rapidly convergent as fine mesh DSA in slab geometry. For x-y geometry our coarse mesh DSA is very effective for thin and intermediate mesh spacings independent of the scattering ratio, but is not effective for purely scattering problems and high aspect ratio zoning. However, if the scattering ratio is less than about 0.95, this procedure is very effective for all mesh spacing.
Accurate multi-robot targeting for keyhole neurosurgery based on external sensor monitoring.

PubMed

Comparetti, Mirko Daniele; Vaccarella, Alberto; Dyagilev, Ilya; Shoham, Moshe; Ferrigno, Giancarlo; De Momi, Elena

2012-05-01

Robotics has recently been introduced in surgery to improve intervention accuracy, to reduce invasiveness and to allow new surgical procedures. In this framework, the ROBOCAST system is an optically surveyed multi-robot chain aimed at enhancing the accuracy of surgical probe insertion during keyhole neurosurgery procedures. The system encompasses three robots, connected as a multiple kinematic chain (serial and parallel), totalling 13 degrees of freedom, and it is used to automatically align the probe onto a desired planned trajectory. The probe is then inserted in the brain, towards the planned target, by means of a haptic interface. This paper presents a new iterative targeting approach to be used in surgical robotic navigation, where the multi-robot chain is used to align the surgical probe to the planned pose, and an external sensor is used to decrease the alignment errors. The iterative targeting was tested in an operating room environment using a skull phantom, and the targets were selected on magnetic resonance images. The proposed targeting procedure allows about 0.3 mm to be obtained as the residual median Euclidean distance between the planned and the desired targets, thus satisfying the surgical accuracy requirements (1 mm), due to the resolution of the diffused medical images. The performances proved to be independent of the robot optical sensor calibration accuracy.
Region of interest processing for iterative reconstruction in x-ray computed tomography

NASA Astrophysics Data System (ADS)

Kopp, Felix K.; Nasirudin, Radin A.; Mei, Kai; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Noël, Peter B.

2015-03-01

The recent advancements in the graphics card technology raised the performance of parallel computing and contributed to the introduction of iterative reconstruction methods for x-ray computed tomography in clinical CT scanners. Iterative maximum likelihood (ML) based reconstruction methods are known to reduce image noise and to improve the diagnostic quality of low-dose CT. However, iterative reconstruction of a region of interest (ROI), especially ML based, is challenging. But for some clinical procedures, like cardiac CT, only a ROI is needed for diagnostics. A high-resolution reconstruction of the full field of view (FOV) consumes unnecessary computation effort that results in a slower reconstruction than clinically acceptable. In this work, we present an extension and evaluation of an existing ROI processing algorithm. Especially improvements for the equalization between regions inside and outside of a ROI are proposed. The evaluation was done on data collected from a clinical CT scanner. The performance of the different algorithms is qualitatively and quantitatively assessed. Our solution to the ROI problem provides an increase in signal-to-noise ratio and leads to visually less noise in the final reconstruction. The reconstruction speed of our technique was observed to be comparable with other previous proposed techniques. The development of ROI processing algorithms in combination with iterative reconstruction will provide higher diagnostic quality in the near future.
An iterative reduced field-of-view reconstruction for periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI.

PubMed

Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J

2015-10-01

To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p < 0.001). In vivo study validated the increased signal-to-noise ratio, which is over four times higher than with density compensation. Image sharpness index was improved using the regularized reconstruction implemented. The rFOV strategy permits near real-time iterative reconstruction to improve the image quality of PROPELLER images. Substantial improvements in image quality metrics were validated in the experiments. The concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.
Vectorized and multitasked solution of the few-group neutron diffusion equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zee, S.K.; Turinsky, P.J.; Shayer, Z.

1989-03-01

A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

NASA Astrophysics Data System (ADS)

Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

2015-07-01

This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

NASA Astrophysics Data System (ADS)

Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

2015-07-01

The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Parallel solution of the symmetric tridiagonal eigenproblem. Research report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-10-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-01-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Parallel iterative solution for h and p approximations of the shallow water equations

USGS Publications Warehouse

Barragy, E.J.; Walters, R.A.

1998-01-01

A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
Accelerating electron tomography reconstruction algorithm ICON with GPU.

PubMed

Chen, Yu; Wang, Zihao; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Sun, Fei; Zhang, Fa

2017-01-01

Electron tomography (ET) plays an important role in studying in situ cell ultrastructure in three-dimensional space. Due to limited tilt angles, ET reconstruction always suffers from the "missing wedge" problem. With a validation procedure, iterative compressed-sensing optimized NUFFT reconstruction (ICON) demonstrates its power in the restoration of validated missing information for low SNR biological ET dataset. However, the huge computational demand has become a major problem for the application of ICON. In this work, we analyzed the framework of ICON and classified the operations of major steps of ICON reconstruction into three types. Accordingly, we designed parallel strategies and implemented them on graphics processing units (GPU) to generate a parallel program ICON-GPU. With high accuracy, ICON-GPU has a great acceleration compared to its CPU version, up to 83.7×, greatly relieving ICON's dependence on computing resource.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

Developing Conceptual Understanding and Procedural Skill in Mathematics: An Iterative Process.

ERIC Educational Resources Information Center

Rittle-Johnson, Bethany; Siegler, Robert S.; Alibali, Martha Wagner

2001-01-01

Proposes that conceptual and procedural knowledge develop in an iterative fashion and improved problem representation is one mechanism underlying the relations between them. Two experiments were conducted with 5th and 6th grade students learning about decimal fractions. Results indicate conceptual and procedural knowledge do develop, iteratively,…
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Iterating between Lessons on Concepts and Procedures Can Improve Mathematics Knowledge

ERIC Educational Resources Information Center

Rittle-Johnson, Bethany; Koedinger, Kenneth

2009-01-01

Background: Knowledge of concepts and procedures seems to develop in an iterative fashion, with increases in one type of knowledge leading to increases in the other type of knowledge. This suggests that iterating between lessons on concepts and procedures may improve learning. Aims: The purpose of the current study was to evaluate the…
Image segmentation by iterative parallel region growing with application to data compression and image analysis

NASA Technical Reports Server (NTRS)

Tilton, James C.

1988-01-01

Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

NASA Astrophysics Data System (ADS)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU-based algorithms based on existing parallelization strategies.
Parallel computation of multigroup reactivity coefficient using iterative method

NASA Astrophysics Data System (ADS)

Susmikanti, Mike; Dewayatna, Winter

2013-09-01

One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Efficient Iterative Methods Applied to the Solution of Transonic Flows

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

1996-02-01

We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Data and Workflow Management Challenges in Global Adjoint Tomography

NASA Astrophysics Data System (ADS)

Lei, W.; Ruan, Y.; Smith, J. A.; Modrak, R. T.; Orsvuran, R.; Krischer, L.; Chen, Y.; Balasubramanian, V.; Hill, J.; Turilli, M.; Bozdag, E.; Lefebvre, M. P.; Jha, S.; Tromp, J.

2017-12-01

It is crucial to take the complete physics of wave propagation into account in seismic tomography to further improve the resolution of tomographic images. The adjoint method is an efficient way of incorporating 3D wave simulations in seismic tomography. However, global adjoint tomography is computationally expensive, requiring thousands of wavefield simulations and massive data processing. Through our collaboration with the Oak Ridge National Laboratory (ORNL) computing group and an allocation on Titan, ORNL's GPU-accelerated supercomputer, we are now performing our global inversions by assimilating waveform data from over 1,000 earthquakes. The first challenge we encountered is dealing with the sheer amount of seismic data. Data processing based on conventional data formats and processing tools (such as SAC), which are not designed for parallel systems, becomes our major bottleneck. To facilitate the data processing procedures, we designed the Adaptive Seismic Data Format (ASDF) and developed a set of Python-based processing tools to replace legacy FORTRAN-based software. These tools greatly enhance reproducibility and accountability while taking full advantage of highly parallel system and showing superior scaling on modern computational platforms. The second challenge is that the data processing workflow contains more than 10 sub-procedures, making it delicate to handle and prone to human mistakes. To reduce human intervention as much as possible, we are developing a framework specifically designed for seismic inversion based on the state-of-the art workflow management research, specifically the Ensemble Toolkit (EnTK), in collaboration with the RADICAL team from Rutgers University. Using the initial developments of the EnTK, we are able to utilize the full computing power of the data processing cluster RHEA at ORNL while keeping human interaction to a minimum and greatly reducing the data processing time. Thanks to all the improvements, we are now able to perform iterations fast enough on more than a 1,000 earthquakes dataset. Starting from model GLAD-M15 (Bozdag et al., 2016), an elastic 3D model with a transversely isotropic upper mantle, we have successfully performed 5 iterations. Our goal is to finish 10 iterations, i.e., generating GLAD M25* by the end of this year.
Multidisciplinary systems optimization by linear decomposition

NASA Technical Reports Server (NTRS)

Sobieski, J.

1984-01-01

In a typical design process major decisions are made sequentially. An illustrated example is given for an aircraft design in which the aerodynamic shape is usually decided first, then the airframe is sized for strength and so forth. An analogous sequence could be laid out for any other major industrial product, for instance, a ship. The loops in the discipline boxes symbolize iterative design improvements carried out within the confines of a single engineering discipline, or subsystem. The loops spanning several boxes depict multidisciplinary design improvement iterations. Omitted for graphical simplicity is parallelism of the disciplinary subtasks. The parallelism is important in order to develop a broad workfront necessary to shorten the design time. If all the intradisciplinary and interdisciplinary iterations were carried out to convergence, the process could yield a numerically optimal design. However, it usually stops short of that because of time and money limitations. This is especially true for the interdisciplinary iterations.
Analysis and Design of ITER 1 MV Core Snubber

NASA Astrophysics Data System (ADS)

Wang, Haitian; Li, Ge

2012-11-01

The core snubber, as a passive protection device, can suppress arc current and absorb stored energy in stray capacitance during the electrical breakdown in accelerating electrodes of ITER NBI. In order to design the core snubber of ITER, the control parameters of the arc peak current have been firstly analyzed by the Fink-Baker-Owren (FBO) method, which are used for designing the DIIID 100 kV snubber. The B-H curve can be derived from the measured voltage and current waveforms, and the hysteresis loss of the core snubber can be derived using the revised parallelogram method. The core snubber can be a simplified representation as an equivalent parallel resistance and inductance, which has been neglected by the FBO method. A simulation code including the parallel equivalent resistance and inductance has been set up. The simulation and experiments result in dramatically large arc shorting currents due to the parallel inductance effect. The case shows that the core snubber utilizing the FBO method gives more compact design.
Dakota, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis :

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, Brian M.; Ebeida, Mohamed Salah; Eldred, Michael S.

The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components requiredmore » for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.« less
Towards a Better Distributed Framework for Learning Big Data

DTIC Science & Technology

2017-06-14

UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Complex wet-environments in electronic-structure calculations

NASA Astrophysics Data System (ADS)

Fisicaro, Giuseppe; Genovese, Luigi; Andreussi, Oliviero; Marzari, Nicola; Goedecker, Stefan

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of an applied electrochemical potentials, including complex electrostatic screening coming from the solvent. In the present work we present a solver to handle both the Generalized Poisson and the Poisson-Boltzmann equation. A preconditioned conjugate gradient (PCG) method has been implemented for the Generalized Poisson and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations. On the other hand, a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. The algorithms take advantage of a preconditioning procedure based on the BigDFT Poisson solver for the standard Poisson equation. They exhibit very high accuracy and parallel efficiency, and allow different boundary conditions, including surfaces. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and it will be released as a independent program, suitable for integration in other codes. We present test calculations for large proteins to demonstrate efficiency and performances. This work was done within the PASC and NCCR MARVEL projects. Computer resources were provided by the Swiss National Supercomputing Centre (CSCS) under Project ID s499. LG acknowledges also support from the EXTMOS EU project.
Stability of iterative procedures with errors for approximating common fixed points of a couple of q-contractive-like mappings in Banach spaces

NASA Astrophysics Data System (ADS)

Zeng, Lu-Chuan; Yao, Jen-Chih

2006-09-01

Recently, Agarwal, Cho, Li and Huang [R.P. Agarwal, Y.J. Cho, J. Li, N.J. Huang, Stability of iterative procedures with errors approximating common fixed points for a couple of quasi-contractive mappings in q-uniformly smooth Banach spaces, J. Math. Anal. Appl. 272 (2002) 435-447] introduced the new iterative procedures with errors for approximating the common fixed point of a couple of quasi-contractive mappings and showed the stability of these iterative procedures with errors in Banach spaces. In this paper, we introduce a new concept of a couple of q-contractive-like mappings (q>1) in a Banach space and apply these iterative procedures with errors for approximating the common fixed point of the couple of q-contractive-like mappings. The results established in this paper improve, extend and unify the corresponding ones of Agarwal, Cho, Li and Huang [R.P. Agarwal, Y.J. Cho, J. Li, N.J. Huang, Stability of iterative procedures with errors approximating common fixed points for a couple of quasi-contractive mappings in q-uniformly smooth Banach spaces, J. Math. Anal. Appl. 272 (2002) 435-447], Chidume [C.E. Chidume, Approximation of fixed points of quasi-contractive mappings in Lp spaces, Indian J. Pure Appl. Math. 22 (1991) 273-386], Chidume and Osilike [C.E. Chidume, M.O. Osilike, Fixed points iterations for quasi-contractive maps in uniformly smooth Banach spaces, Bull. Korean Math. Soc. 30 (1993) 201-212], Liu [Q.H. Liu, On Naimpally and Singh's open questions, J. Math. Anal. Appl. 124 (1987) 157-164; Q.H. Liu, A convergence theorem of the sequence of Ishikawa iterates for quasi-contractive mappings, J. Math. Anal. Appl. 146 (1990) 301-305], Osilike [M.O. Osilike, A stable iteration procedure for quasi-contractive maps, Indian J. Pure Appl. Math. 27 (1996) 25-34; M.O. Osilike, Stability of the Ishikawa iteration method for quasi-contractive maps, Indian J. Pure Appl. Math. 28 (1997) 1251-1265] and many others in the literature.
Efficient iterative methods applied to the solution of transonic flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.

1996-02-01

We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
Block iterative restoration of astronomical images with the massively parallel processor

NASA Technical Reports Server (NTRS)

Heap, Sara R.; Lindler, Don J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram

2013-04-09

A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithmmore » is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.« less

Sensitivity calculations for iteratively solved problems

NASA Technical Reports Server (NTRS)

Haftka, R. T.

1985-01-01

The calculation of sensitivity derivatives of solutions of iteratively solved systems of algebraic equations is investigated. A modified finite difference procedure is presented which improves the accuracy of the calculated derivatives. The procedure is demonstrated for a simple algebraic example as well as an element-by-element preconditioned conjugate gradient iterative solution technique applied to truss examples.
A Two Colorable Fourth Order Compact Difference Scheme and Parallel Iterative Solution of the 3D Convection Diffusion Equation

NASA Technical Reports Server (NTRS)

Zhang, Jun; Ge, Lixin; Kouatchou, Jules

2000-01-01

A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Parallelization of implicit finite difference schemes in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

1990-01-01

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Global Asymptotic Behavior of Iterative Implicit Schemes

NASA Technical Reports Server (NTRS)

Yee, H. C.; Sweby, P. K.

1994-01-01

The global asymptotic nonlinear behavior of some standard iterative procedures in solving nonlinear systems of algebraic equations arising from four implicit linear multistep methods (LMMs) in discretizing three models of 2 x 2 systems of first-order autonomous nonlinear ordinary differential equations (ODEs) is analyzed using the theory of dynamical systems. The iterative procedures include simple iteration and full and modified Newton iterations. The results are compared with standard Runge-Kutta explicit methods, a noniterative implicit procedure, and the Newton method of solving the steady part of the ODEs. Studies showed that aside from exhibiting spurious asymptotes, all of the four implicit LMMs can change the type and stability of the steady states of the differential equations (DEs). They also exhibit a drastic distortion but less shrinkage of the basin of attraction of the true solution than standard nonLMM explicit methods. The simple iteration procedure exhibits behavior which is similar to standard nonLMM explicit methods except that spurious steady-state numerical solutions cannot occur. The numerical basins of attraction of the noniterative implicit procedure mimic more closely the basins of attraction of the DEs and are more efficient than the three iterative implicit procedures for the four implicit LMMs. Contrary to popular belief, the initial data using the Newton method of solving the steady part of the DEs may not have to be close to the exact steady state for convergence. These results can be used as an explanation for possible causes and cures of slow convergence and nonconvergence of steady-state numerical solutions when using an implicit LMM time-dependent approach in computational fluid dynamics.
Parallel Implementation of 3-D Iterative Reconstruction With Intra-Thread Update for the jPET-D4

NASA Astrophysics Data System (ADS)

Lam, Chih Fung; Yamaya, Taiga; Obi, Takashi; Yoshida, Eiji; Inadama, Naoko; Shibuya, Kengo; Nishikido, Fumihiko; Murayama, Hideo

2009-02-01

One way to speed-up iterative image reconstruction is by parallel computing with a computer cluster. However, as the number of computing threads increases, parallel efficiency decreases due to network transfer delay. In this paper, we proposed a method to reduce data transfer between computing threads by introducing an intra-thread update. The update factor is collected from each slave thread and a global image is updated as usual in the first K sub-iteration. In the rest of the sub-iterations, the global image is only updated at an interval which is controlled by a parameter L. In between that interval, the intra-thread update is carried out whereby an image update is performed in each slave thread locally. We investigated combinations of K and L parameters based on parallel implementation of RAMLA for the jPET-D4 scanner. Our evaluation used four workstations with a total of 16 slave threads. Each slave thread calculated a different set of LORs which are divided according to ring difference numbers. We assessed image quality of the proposed method with a hotspot simulation phantom. The figure of merit was the full-width-half-maximum of hotspots and the background normalized standard deviation. At an optimum K and L setting, we did not find significant change in the output images. We also applied the proposed method to a Hoffman phantom experiment and found the difference due to intra-thread update was negligible. With the intra-thread update, computation time could be reduced by about 23%.
Real-space processing of helical filaments in SPARX

PubMed Central

Behrmann, Elmar; Tao, Guozhi; Stokes, David L.; Egelman, Edward H.; Raunser, Stefan; Penczek, Pawel A.

2012-01-01

We present a major revision of the iterative helical real-space refinement (IHRSR) procedure and its implementation in the SPARX single particle image processing environment. We built on over a decade of experience with IHRSR helical structure determination and we took advantage of the flexible SPARX infrastructure to arrive at an implementation that offers ease of use, flexibility in designing helical structure determination strategy, and high computational efficiency. We introduced the 3D projection matching code which now is able to work with non-cubic volumes, the geometry better suited for long helical filaments, we enhanced procedures for establishing helical symmetry parameters, and we parallelized the code using distributed memory paradigm. Additional feature includes a graphical user interface that facilitates entering and editing of parameters controlling the structure determination strategy of the program. In addition, we present a novel approach to detect and evaluate structural heterogeneity due to conformer mixtures that takes advantage of helical structure redundancy. PMID:22248449
New Bandwidth Efficient Parallel Concatenated Coding Schemes

NASA Technical Reports Server (NTRS)

Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

1996-01-01

We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.
NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

DTIC Science & Technology

2011-12-16

other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...Lab affiliates National Instruments, NEC, Nokia , NVIDIA, and Samsung. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism...concurrently update x, some of these CAS’s will fail and those parallel loop iterations will recompute their updates to x and try again. Consider the parallel
Learning control system design based on 2-D theory - An application to parallel link manipulator

NASA Technical Reports Server (NTRS)

Geng, Z.; Carroll, R. L.; Lee, J. D.; Haynes, L. H.

1990-01-01

An approach to iterative learning control system design based on two-dimensional system theory is presented. A two-dimensional model for the iterative learning control system which reveals the connections between learning control systems and two-dimensional system theory is established. A learning control algorithm is proposed, and the convergence of learning using this algorithm is guaranteed by two-dimensional stability. The learning algorithm is applied successfully to the trajectory tracking control problem for a parallel link robot manipulator. The excellent performance of this learning algorithm is demonstrated by the computer simulation results.
Soft-output decoding algorithms in iterative decoding of turbo codes

NASA Technical Reports Server (NTRS)

Benedetto, S.; Montorsi, G.; Divsalar, D.; Pollara, F.

1996-01-01

In this article, we present two versions of a simplified maximum a posteriori decoding algorithm. The algorithms work in a sliding window form, like the Viterbi algorithm, and can thus be used to decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring code trellis termination. A heuristic explanation is also given of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes (turbo codes). The performances of the two algorithms are compared on the basis of a powerful rate 1/3 parallel concatenated code. Basic circuits to implement the simplified a posteriori decoding algorithm using lookup tables, and two further approximations (linear and threshold), with a very small penalty, to eliminate the need for lookup tables are proposed.
An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Walker, H. F.

1975-01-01

A general iterative procedure is given for determining the consistent maximum likelihood estimates of normal distributions. In addition, a local maximum of the log-likelihood function, Newtons's method, a method of scoring, and modifications of these procedures are discussed.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm

NASA Astrophysics Data System (ADS)

Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

2018-04-01

Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
On-line range images registration with GPGPU

NASA Astrophysics Data System (ADS)

Będkowski, J.; Naruniec, J.

2013-03-01

This paper concerns implementation of algorithms in the two important aspects of modern 3D data processing: data registration and segmentation. Solution proposed for the first topic is based on the 3D space decomposition, while the latter on image processing and local neighbourhood search. Data processing is implemented by using NVIDIA compute unified device architecture (NIVIDIA CUDA) parallel computation. The result of the segmentation is a coloured map where different colours correspond to different objects, such as walls, floor and stairs. The research is related to the problem of collecting 3D data with a RGB-D camera mounted on a rotated head, to be used in mobile robot applications. Performance of the data registration algorithm is aimed for on-line processing. The iterative closest point (ICP) approach is chosen as a registration method. Computations are based on the parallel fast nearest neighbour search. This procedure decomposes 3D space into cubic buckets and, therefore, the time of the matching is deterministic. First technique of the data segmentation uses accele-rometers integrated with a RGB-D sensor to obtain rotation compensation and image processing method for defining pre-requisites of the known categories. The second technique uses the adapted nearest neighbour search procedure for obtaining normal vectors for each range point.
Effect of contrast enhancement prior to iteration procedure on image correction for soft x-ray projection microscopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jamsranjav, Erdenetogtokh, E-mail: ja.erdenetogtokh@gmail.com; Shiina, Tatsuo, E-mail: shiina@faculity.chiba-u.jp; Kuge, Kenichi

2016-01-28

Soft X-ray microscopy is well recognized as a powerful tool of high-resolution imaging for hydrated biological specimens. Projection type of it has characteristics of easy zooming function, simple optical layout and so on. However the image is blurred by the diffraction of X-rays, leading the spatial resolution to be worse. In this study, the blurred images have been corrected by an iteration procedure, i.e., Fresnel and inverse Fresnel transformations repeated. This method was confirmed by earlier studies to be effective. Nevertheless it was not enough to some images showing too low contrast, especially at high magnification. In the present study,more » we tried a contrast enhancement method to make the diffraction fringes clearer prior to the iteration procedure. The method was effective to improve the images which were not successful by iteration procedure only.« less
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

NASA Astrophysics Data System (ADS)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

2018-01-01

We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
Solution of a tridiagonal system of equations on the finite element machine

NASA Technical Reports Server (NTRS)

Bostic, S. W.

1984-01-01

Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.

Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

NASA Astrophysics Data System (ADS)

Marx, Alain; Lütjens, Hinrich

2017-03-01

A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
Modified Chebyshev Picard Iteration for Efficient Numerical Integration of Ordinary Differential Equations

NASA Astrophysics Data System (ADS)

Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.

2013-09-01

Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.
Parallel fast multipole boundary element method applied to computational homogenization

NASA Astrophysics Data System (ADS)

Ptaszny, Jacek

2018-01-01

In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
Parallel iterative methods for sparse linear and nonlinear equations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

NASA Technical Reports Server (NTRS)

Padovan, Joseph; Krishna, Lala; Gute, Douglas

1997-01-01

Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
An iterative transformation procedure for numerical solution of flutter and similar characteristics-value problems

NASA Technical Reports Server (NTRS)

Gossard, Myron L

1952-01-01

An iterative transformation procedure suggested by H. Wielandt for numerical solution of flutter and similar characteristic-value problems is presented. Application of this procedure to ordinary natural-vibration problems and to flutter problems is shown by numerical examples. Comparisons of computed results with experimental values and with results obtained by other methods of analysis are made.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Application Of Iterative Reconstruction Techniques To Conventional Circular Tomography

NASA Astrophysics Data System (ADS)

Ghosh Roy, D. N.; Kruger, R. A.; Yih, B. C.; Del Rio, S. P.; Power, R. L.

1985-06-01

Two "point-by-point" iteration procedures, namely, Iterative Least Square Technique (ILST) and Simultaneous Iterative Reconstructive Technique (SIRT) were applied to classical circular tomographic reconstruction. The technique of tomosynthetic DSA was used in forming the tomographic images. Reconstructions of a dog's renal and neck anatomy are presented.
Computational Issues in Damping Identification for Large Scale Problems

NASA Technical Reports Server (NTRS)

Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.

1997-01-01

Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Parallel heuristics for scalable community detection

DOE PAGES

Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

2015-08-14

Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
Forward marching procedure for separated boundary-layer flows

NASA Technical Reports Server (NTRS)

Carter, J. E.; Wornom, S. F.

1975-01-01

A forward-marching procedure for separated boundary-layer flows which permits the rapid and accurate solution of flows of limited extent is presented. The streamwise convection of vorticity in the reversed flow region is neglected, and this approximation is incorporated into a previously developed (Carter, 1974) inverse boundary-layer procedure. The equations are solved by the Crank-Nicolson finite-difference scheme in which column iteration is carried out at each streamwise station. Instabilities encountered in the column iterations are removed by introducing timelike terms in the finite-difference equations. This provides both unconditional diagonal dominance and a column iterative scheme, found to be stable using the von Neumann stability analysis.
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

PubMed

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2014-02-28

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

PubMed Central

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2015-01-01

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE PAGES

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...

2017-09-14

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions

NASA Astrophysics Data System (ADS)

Furuichi, Mikito; Nishiura, Daisuke

2017-10-01

We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less

Performance Analysis of Iterative Channel Estimation and Multiuser Detection in Multipath DS-CDMA Channels

NASA Astrophysics Data System (ADS)

Li, Husheng; Betz, Sharon M.; Poor, H. Vincent

2007-05-01

This paper examines the performance of decision feedback based iterative channel estimation and multiuser detection in channel coded aperiodic DS-CDMA systems operating over multipath fading channels. First, explicit expressions describing the performance of channel estimation and parallel interference cancellation based multiuser detection are developed. These results are then combined to characterize the evolution of the performance of a system that iterates among channel estimation, multiuser detection and channel decoding. Sufficient conditions for convergence of this system to a unique fixed point are developed.
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

2005-01-01

A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
Performance and capacity analysis of Poisson photon-counting based Iter-PIC OCDMA systems.

PubMed

Li, Lingbin; Zhou, Xiaolin; Zhang, Rong; Zhang, Dingchen; Hanzo, Lajos

2013-11-04

In this paper, an iterative parallel interference cancellation (Iter-PIC) technique is developed for optical code-division multiple-access (OCDMA) systems relying on shot-noise limited Poisson photon-counting reception. The novel semi-analytical tool of extrinsic information transfer (EXIT) charts is used for analysing both the bit error rate (BER) performance as well as the channel capacity of these systems and the results are verified by Monte Carlo simulations. The proposed Iter-PIC OCDMA system is capable of achieving two orders of magnitude BER improvements and a 0.1 nats of capacity improvement over the conventional chip-level OCDMA systems at a coding rate of 1/10.
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions, Addendum

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Walker, H. F.

1975-01-01

New results and insights concerning a previously published iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions were discussed. It was shown that the procedure converges locally to the consistent maximum likelihood estimate as long as a specified parameter is bounded between two limits. Bound values were given to yield optimal local convergence.
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
Sparse-view photoacoustic tomography using virtual parallel-projections and spatially adaptive filtering

NASA Astrophysics Data System (ADS)

Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng

2018-02-01

To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis

ERIC Educational Resources Information Center

Cao, Mengyang; Tay, Louis; Liu, Yaowu

2017-01-01

This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
Domain decomposition preconditioners for the spectral collocation method

NASA Technical Reports Server (NTRS)

Quarteroni, Alfio; Sacchilandriani, Giovanni

1988-01-01

Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments.

PubMed

Fisicaro, G; Genovese, L; Andreussi, O; Marzari, N; Goedecker, S

2016-01-07

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fisicaro, G., E-mail: giuseppe.fisicaro@unibas.ch; Goedecker, S.; Genovese, L.

2016-01-07

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and themore » linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.« less
Iterative pass optimization of sequence data

NASA Technical Reports Server (NTRS)

Wheeler, Ward C.

2003-01-01

The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

PubMed

Hielscher, Andreas H; Bartel, Sebastian

2004-02-01

Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Dissipation, Voltage Profile and Levy Dragon in a Special Ladder Network

ERIC Educational Resources Information Center

Ucak, C.

2009-01-01

A ladder network constructed by an elementary two-terminal network consisting of a parallel resistor-inductor block in series with a parallel resistor-capacitor block sometimes is said to have a non-dispersive dissipative response. This special ladder network is created iteratively by replacing the elementary two-terminal network in place of the…
Improved evaluation of optical depth components from Langley plot data

NASA Technical Reports Server (NTRS)

Biggar, S. F.; Gellman, D. I.; Slater, P. N.

1990-01-01

A simple, iterative procedure to determine the optical depth components of the extinction optical depth measured by a solar radiometer is presented. Simulated data show that the iterative procedure improves the determination of the exponent of a Junge law particle size distribution. The determination of the optical depth due to aerosol scattering is improved as compared to a method which uses only two points from the extinction data. The iterative method was used to determine spectral optical depth components for June 11-13, 1988 during the MAC III experiment.
Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.

PubMed

Palkowski, Marek; Bielecki, Wlodzimierz

2018-01-15

RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*

PubMed Central

Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.

2012-01-01

This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200

Myria: Scalable Analytics as a Service

NASA Astrophysics Data System (ADS)

Howe, B.; Halperin, D.; Whitaker, A.

2014-12-01

At the UW eScience Institute, we're working to empower non-experts, especially in the sciences, to write and use data-parallel algorithms. To this end, we are building Myria, a web-based platform for scalable analytics and data-parallel programming. Myria's internal model of computation is the relational algebra extended with iteration, such that every program is inherently data-parallel, just as every query in a database is inherently data-parallel. But unlike databases, iteration is a first class concept, allowing us to express machine learning tasks, graph traversal tasks, and more. Programs can be expressed in a number of languages and can be executed on a number of execution environments, but we emphasize a particular language called MyriaL that supports both imperative and declarative styles and a particular execution engine called MyriaX that uses an in-memory column-oriented representation and asynchronous iteration. We deliver Myria over the web as a service, providing an editor, performance analysis tools, and catalog browsing features in a single environment. We find that this web-based "delivery vector" is critical in reaching non-experts: they are insulated from irrelevant effort technical work associated with installation, configuration, and resource management. The MyriaX backend, one of several execution runtimes we support, is a main-memory, column-oriented, RDBMS-on-the-worker system that supports cyclic data flows as a first-class citizen and has been shown to outperform competitive systems on 100-machine cluster sizes. I will describe the Myria system, give a demo, and present some new results in large-scale oceanographic microbiology.
The multigrid preconditioned conjugate gradient method

NASA Technical Reports Server (NTRS)

Tatebe, Osamu

1993-01-01

A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
A proposal for amending administrative law to facilitate adaptive management

NASA Astrophysics Data System (ADS)

Craig, Robin K.; Ruhl, J. B.; Brown, Eleanor D.; Williams, Byron K.

2017-07-01

In this article we examine how federal agencies use adaptive management. In order for federal agencies to implement adaptive management more successfully, administrative law must adapt to adaptive management, and we propose changes in administrative law that will help to steer the current process out of a dead end. Adaptive management is a form of structured decision making that is widely used in natural resources management. It involves specific steps integrated in an iterative process for adjusting management actions as new information becomes available. Theoretical requirements for adaptive management notwithstanding, federal agency decision making is subject to the requirements of the federal Administrative Procedure Act, and state agencies are subject to the states’ parallel statutes. We argue that conventional administrative law has unnecessarily shackled effective use of adaptive management. We show that through a specialized ‘adaptive management track’ of administrative procedures, the core values of administrative law—especially public participation, judicial review, and finality— can be implemented in ways that allow for more effective adaptive management. We present and explain draft model legislation (the Model Adaptive Management Procedure Act) that would create such a track for the specific types of agency decision making that could benefit from adaptive management.
A proposal for amending administrative law to facilitate adaptive management

USGS Publications Warehouse

Craig, Robin K.; Ruhl, J.B.; Brown, Eleanor D.; Williams, Byron K.

2017-01-01

In this article we examine how federal agencies use adaptive management. In order for federal agencies to implement adaptive management more successfully, administrative law must adapt to adaptive management, and we propose changes in administrative law that will help to steer the current process out of a dead end. Adaptive management is a form of structured decision making that is widely used in natural resources management. It involves specific steps integrated in an iterative process for adjusting management actions as new information becomes available. Theoretical requirements for adaptive management notwithstanding, federal agency decision making is subject to the requirements of the federal Administrative Procedure Act, and state agencies are subject to the states' parallel statutes. We argue that conventional administrative law has unnecessarily shackled effective use of adaptive management. We show that through a specialized 'adaptive management track' of administrative procedures, the core values of administrative law—especially public participation, judicial review, and finality— can be implemented in ways that allow for more effective adaptive management. We present and explain draft model legislation (the Model Adaptive Management Procedure Act) that would create such a track for the specific types of agency decision making that could benefit from adaptive management.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi N.; Hixon, Duane

1991-01-01

Efficient iterative solution methods are being developed for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. Thus, the extra work required by iterative schemes can also be designed to perform efficiently on current and future generation scalable, missively parallel machines. An obvious candidate for iteratively solving the system of coupled nonlinear algebraic equations arising in CFD applications is the Newton method. Newton's method was implemented in existing finite difference and finite volume methods. Depending on the complexity of the problem, the number of Newton iterations needed per step to solve the discretized system of equations can, however, vary dramatically from a few to several hundred. Another popular approach based on the classical conjugate gradient method, known as the GMRES (Generalized Minimum Residual) algorithm is investigated. The GMRES algorithm was used in the past by a number of researchers for solving steady viscous and inviscid flow problems with considerable success. Here, the suitability of this algorithm is investigated for solving the system of nonlinear equations that arise in unsteady Navier-Stokes solvers at each time step. Unlike the Newton method which attempts to drive the error in the solution at each and every node down to zero, the GMRES algorithm only seeks to minimize the L2 norm of the error. In the GMRES algorithm the changes in the flow properties from one time step to the next are assumed to be the sum of a set of orthogonal vectors. By choosing the number of vectors to a reasonably small value N (between 5 and 20) the work required for advancing the solution from one time step to the next may be kept to (N+1) times that of a noniterative scheme. Many of the operations required by the GMRES algorithm such as matrix-vector multiplies, matrix additions and subtractions can all be vectorized and parallelized efficiently.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

PubMed Central

Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

2014-01-01

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
A highly parallel multigrid-like method for the solution of the Euler equations

NASA Technical Reports Server (NTRS)

Tuminaro, Ray S.

1989-01-01

We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.
The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
The role of simulation in the design of a neural network chip

NASA Technical Reports Server (NTRS)

Desai, Utpal; Roppel, Thaddeus A.; Padgett, Mary L.

1993-01-01

An iterative, simulation-based design procedure for a neural network chip is introduced. For this design procedure, the goal is to produce a chip layout for a neural network in which the weights are determined by transistor gate width-to-length ratios. In a given iteration, the current layout is simulated using the circuit simulator SPICE, and layout adjustments are made based on conventional gradient-decent methods. After the iteration converges, the chip is fabricated. Monte Carlo analysis is used to predict the effect of statistical fabrication process variations on the overall performance of the neural network chip.
Evaluating the performance of two neutron spectrum unfolding codes based on iterative procedures and artificial neural networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ortiz-Rodriguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.

In this work the performance of two neutron spectrum unfolding codes based on iterative procedures and artificial neural networks is evaluated. The first one code based on traditional iterative procedures and called Neutron spectrometry and dosimetry from the Universidad Autonoma de Zacatecas (NSDUAZ) use the SPUNIT iterative algorithm and was designed to unfold neutron spectrum and calculate 15 dosimetric quantities and 7 IAEA survey meters. The main feature of this code is the automated selection of the initial guess spectrum trough a compendium of neutron spectrum compiled by the IAEA. The second one code known as Neutron spectrometry and dosimetrymore » with artificial neural networks (NDSann) is a code designed using neural nets technology. The artificial intelligence approach of neural net does not solve mathematical equations. By using the knowledge stored at synaptic weights on a neural net properly trained, the code is capable to unfold neutron spectrum and to simultaneously calculate 15 dosimetric quantities, needing as entrance data, only the rate counts measured with a Bonner spheres system. Similarities of both NSDUAZ and NSDann codes are: they follow the same easy and intuitive user's philosophy and were designed in a graphical interface under the LabVIEW programming environment. Both codes unfold the neutron spectrum expressed in 60 energy bins, calculate 15 dosimetric quantities and generate a full report in HTML format. Differences of these codes are: NSDUAZ code was designed using classical iterative approaches and needs an initial guess spectrum in order to initiate the iterative procedure. In NSDUAZ, a programming routine was designed to calculate 7 IAEA instrument survey meters using the fluence-dose conversion coefficients. NSDann code use artificial neural networks for solving the ill-conditioned equation system of neutron spectrometry problem through synaptic weights of a properly trained neural network. Contrary to iterative procedures, in neural net approach it is possible to reduce the rate counts used to unfold the neutron spectrum. To evaluate these codes a computer tool called Neutron Spectrometry and dosimetry computer tool was designed. The results obtained with this package are showed. The codes here mentioned are freely available upon request to the authors.« less
Evaluating the performance of two neutron spectrum unfolding codes based on iterative procedures and artificial neural networks

NASA Astrophysics Data System (ADS)

Ortiz-Rodríguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.; Solís Sánches, L. O.; Miranda, R. Castañeda; Cervantes Viramontes, J. M.; Vega-Carrillo, H. R.

2013-07-01

In this work the performance of two neutron spectrum unfolding codes based on iterative procedures and artificial neural networks is evaluated. The first one code based on traditional iterative procedures and called Neutron spectrometry and dosimetry from the Universidad Autonoma de Zacatecas (NSDUAZ) use the SPUNIT iterative algorithm and was designed to unfold neutron spectrum and calculate 15 dosimetric quantities and 7 IAEA survey meters. The main feature of this code is the automated selection of the initial guess spectrum trough a compendium of neutron spectrum compiled by the IAEA. The second one code known as Neutron spectrometry and dosimetry with artificial neural networks (NDSann) is a code designed using neural nets technology. The artificial intelligence approach of neural net does not solve mathematical equations. By using the knowledge stored at synaptic weights on a neural net properly trained, the code is capable to unfold neutron spectrum and to simultaneously calculate 15 dosimetric quantities, needing as entrance data, only the rate counts measured with a Bonner spheres system. Similarities of both NSDUAZ and NSDann codes are: they follow the same easy and intuitive user's philosophy and were designed in a graphical interface under the LabVIEW programming environment. Both codes unfold the neutron spectrum expressed in 60 energy bins, calculate 15 dosimetric quantities and generate a full report in HTML format. Differences of these codes are: NSDUAZ code was designed using classical iterative approaches and needs an initial guess spectrum in order to initiate the iterative procedure. In NSDUAZ, a programming routine was designed to calculate 7 IAEA instrument survey meters using the fluence-dose conversion coefficients. NSDann code use artificial neural networks for solving the ill-conditioned equation system of neutron spectrometry problem through synaptic weights of a properly trained neural network. Contrary to iterative procedures, in neural net approach it is possible to reduce the rate counts used to unfold the neutron spectrum. To evaluate these codes a computer tool called Neutron Spectrometry and dosimetry computer tool was designed. The results obtained with this package are showed. The codes here mentioned are freely available upon request to the authors.
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less
2D Seismic Imaging of Elastic Parameters by Frequency Domain Full Waveform Inversion

NASA Astrophysics Data System (ADS)

Brossier, R.; Virieux, J.; Operto, S.

2008-12-01

Thanks to recent advances in parallel computing, full waveform inversion is today a tractable seismic imaging method to reconstruct physical parameters of the earth interior at different scales ranging from the near- surface to the deep crust. We present a massively parallel 2D frequency-domain full-waveform algorithm for imaging visco-elastic media from multi-component seismic data. The forward problem (i.e. the resolution of the frequency-domain 2D PSV elastodynamics equations) is based on low-order Discontinuous Galerkin (DG) method (P0 and/or P1 interpolations). Thanks to triangular unstructured meshes, the DG method allows accurate modeling of both body waves and surface waves in case of complex topography for a discretization of 10 to 15 cells per shear wavelength. The frequency-domain DG system is solved efficiently for multiple sources with the parallel direct solver MUMPS. The local inversion procedure (i.e. minimization of residuals between observed and computed data) is based on the adjoint-state method which allows to efficiently compute the gradient of the objective function. Applying the inversion hierarchically from the low frequencies to the higher ones defines a multiresolution imaging strategy which helps convergence towards the global minimum. In place of expensive Newton algorithm, the combined use of the diagonal terms of the approximate Hessian matrix and optimization algorithms based on quasi-Newton methods (Conjugate Gradient, LBFGS, ...) allows to improve the convergence of the iterative inversion. The distribution of forward problem solutions over processors driven by a mesh partitioning performed by METIS allows to apply most of the inversion in parallel. We shall present the main features of the parallel modeling/inversion algorithm, assess its scalability and illustrate its performances with realistic synthetic case studies.
Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
Application of microarray analysis on computer cluster and cloud platforms.

PubMed

Bernau, C; Boulesteix, A-L; Knaus, J

2013-01-01

Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Decomposition of timed automata for solving scheduling problems

NASA Astrophysics Data System (ADS)

Nishi, Tatsushi; Wakatake, Masato

2014-03-01

A decomposition algorithm for scheduling problems based on timed automata (TA) model is proposed. The problem is represented as an optimal state transition problem for TA. The model comprises of the parallel composition of submodels such as jobs and resources. The procedure of the proposed methodology can be divided into two steps. The first step is to decompose the TA model into several submodels by using decomposable condition. The second step is to combine individual solution of subproblems for the decomposed submodels by the penalty function method. A feasible solution for the entire model is derived through the iterated computation of solving the subproblem for each submodel. The proposed methodology is applied to solve flowshop and jobshop scheduling problems. Computational experiments demonstrate the effectiveness of the proposed algorithm compared with a conventional TA scheduling algorithm without decomposition.
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less

Iterative nonlinear joint transform correlation for the detection of objects in cluttered scenes

NASA Astrophysics Data System (ADS)

Haist, Tobias; Tiziani, Hans J.

1999-03-01

An iterative correlation technique with digital image processing in the feedback loop for the detection of small objects in cluttered scenes is proposed. A scanning aperture is combined with the method in order to improve the immunity against noise and clutter. Multiple reference objects or different views of one object are processed in parallel. We demonstrate the method by detecting a noisy and distorted face in a crowd with a nonlinear joint transform correlator.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Improved interpretation of satellite altimeter data using genetic algorithms

NASA Technical Reports Server (NTRS)

Messa, Kenneth; Lybanon, Matthew

1992-01-01

Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
ICON-MIC: Implementing a CPU/MIC Collaboration Parallel Framework for ICON on Tianhe-2 Supercomputer.

PubMed

Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa

2018-03-01

Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
Improved Savitzky-Golay-method-based fluorescence subtraction algorithm for rapid recovery of Raman spectra.

PubMed

Chen, Kun; Zhang, Hongyuan; Wei, Haoyun; Li, Yan

2014-08-20

In this paper, we propose an improved subtraction algorithm for rapid recovery of Raman spectra that can substantially reduce the computation time. This algorithm is based on an improved Savitzky-Golay (SG) iterative smoothing method, which involves two key novel approaches: (a) the use of the Gauss-Seidel method and (b) the introduction of a relaxation factor into the iterative procedure. By applying a novel successive relaxation (SG-SR) iterative method to the relaxation factor, additional improvement in the convergence speed over the standard Savitzky-Golay procedure is realized. The proposed improved algorithm (the RIA-SG-SR algorithm), which uses SG-SR-based iteration instead of Savitzky-Golay iteration, has been optimized and validated with a mathematically simulated Raman spectrum, as well as experimentally measured Raman spectra from non-biological and biological samples. The method results in a significant reduction in computing cost while yielding consistent rejection of fluorescence and noise for spectra with low signal-to-fluorescence ratios and varied baselines. In the simulation, RIA-SG-SR achieved 1 order of magnitude improvement in iteration number and 2 orders of magnitude improvement in computation time compared with the range-independent background-subtraction algorithm (RIA). Furthermore the computation time of the experimentally measured raw Raman spectrum processing from skin tissue decreased from 6.72 to 0.094 s. In general, the processing of the SG-SR method can be conducted within dozens of milliseconds, which can provide a real-time procedure in practical situations.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

PubMed

Liang, Faming; Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.
A Parallel Fast Sweeping Method for the Eikonal Equation

NASA Astrophysics Data System (ADS)

Baker, B.

2017-12-01

Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.
A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

PubMed Central

Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu

2006-04-17

TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.

NASA Astrophysics Data System (ADS)

Liu, Y.; Li, Y.

2016-12-01

We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Optimal Averages for Nonlinear Signal Decompositions - Another Alternative for Empirical Mode Decomposition

DTIC Science & Technology

2014-10-01

nonlinear and non-stationary signals. It aims at decomposing a signal, via an iterative sifting procedure, into several intrinsic mode functions ...stationary signals. It aims at decomposing a signal, via an iterative sifting procedure into several intrinsic mode functions (IMFs), and each of the... function , optimization. 1 Introduction It is well known that nonlinear and non-stationary signal analysis is important and difficult. His- torically
Transport synthetic acceleration with opposing reflecting boundary conditions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zika, M.R.; Adams, M.L.

2000-02-01

The transport synthetic acceleration (TSA) scheme is extended to problems with opposing reflecting boundary conditions. This synthetic method employs a simplified transport operator as its low-order approximation. A procedure is developed that allows the use of the conjugate gradient (CG) method to solve the resulting low-order system of equations. Several well-known transport iteration algorithms are cast in a linear algebraic form to show their equivalence to standard iterative techniques. Source iteration in the presence of opposing reflecting boundary conditions is shown to be equivalent to a (poorly) preconditioned stationary Richardson iteration, with the preconditioner defined by the method of iteratingmore » on the incident fluxes on the reflecting boundaries. The TSA method (and any synthetic method) amounts to a further preconditioning of the Richardson iteration. The presence of opposing reflecting boundary conditions requires special consideration when developing a procedure to realize the CG method for the proposed system of equations. The CG iteration may be applied only to symmetric positive definite matrices; this condition requires the algebraic elimination of the boundary angular corrections from the low-order equations. As a consequence of this elimination, evaluating the action of the resulting matrix on an arbitrary vector involves two transport sweeps and a transmission iteration. Results of applying the acceleration scheme to a simple test problem are presented.« less
Iterative image reconstruction for PROPELLER-MRI using the nonuniform fast fourier transform.

PubMed

Tamhane, Ashish A; Anastasio, Mark A; Gui, Minzhi; Arfanakis, Konstantinos

2010-07-01

To investigate an iterative image reconstruction algorithm using the nonuniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) MRI. Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it with that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased signal to noise ratio, reduced artifacts, for similar spatial resolution, compared with gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter, the new reconstruction technique may provide PROPELLER images with improved image quality compared with conventional gridding. (c) 2010 Wiley-Liss, Inc.
Iterative Image Reconstruction for PROPELLER-MRI using the NonUniform Fast Fourier Transform

PubMed Central

Tamhane, Ashish A.; Anastasio, Mark A.; Gui, Minzhi; Arfanakis, Konstantinos

2013-01-01

Purpose To investigate an iterative image reconstruction algorithm using the non-uniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping parallEL Lines with Enhanced Reconstruction) MRI. Materials and Methods Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it to that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. Results It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased SNR, reduced artifacts, for similar spatial resolution, compared to gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. Conclusion An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter the new reconstruction technique may provide PROPELLER images with improved image quality compared to conventional gridding. PMID:20578028
Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

DTIC Science & Technology

2012-10-01

using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS

Real-Time, Multiple, Pan/Tilt/Zoom, Computer Vision Tracking, and 3D Position Estimating System for Unmanned Aerial System Metrology

DTIC Science & Technology

2013-10-18

of the enclosed tasks plus the last parallel task for a total of five parallel tasks for one iteration, i). for j = 1…N for i = 1… 8 Figure...drizzling juices culminating in a state of salivating desire to cut a piece and enjoy. On the other hand, the smell could be that of a pungent, unpleasant
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Integrated modelling of steady-state scenarios and heating and current drive mixes for ITER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murakami, Masanori; Park, Jin Myung; Giruzzi, G.

2011-01-01

Recent progress on ITER steady-state (SS) scenario modelling by the ITPA-IOS group is reviewed. Code-to-code benchmarks as the IOS group's common activities for the two SS scenarios (weak shear scenario and internal transport barrier scenario) are discussed in terms of transport, kinetic profiles, and heating and current drive (CD) sources using various transport codes. Weak magnetic shear scenarios integrate the plasma core and edge by combining a theory-based transport model (GLF23) with scaled experimental boundary profiles. The edge profiles (at normalized radius rho = 0.8-1.0) are adopted from an edge-localized mode-averaged analysis of a DIII-D ITER demonstration discharge. A fullymore » noninductive SS scenario is achieved with fusion gain Q = 4.3, noninductive fraction f(NI) = 100%, bootstrap current fraction f(BS) = 63% and normalized beta beta(N) = 2.7 at plasma current I(p) = 8MA and toroidal field B(T) = 5.3 T using ITER day-1 heating and CD capability. Substantial uncertainties come from outside the radius of setting the boundary conditions (rho = 0.8). The present simulation assumed that beta(N)(rho) at the top of the pedestal (rho = 0.91) is about 25% above the peeling-ballooning threshold. ITER will have a challenge to achieve the boundary, considering different operating conditions (T(e)/T(i) approximate to 1 and density peaking). Overall, the experimentally scaled edge is an optimistic side of the prediction. A number of SS scenarios with different heating and CD mixes in a wide range of conditions were explored by exploiting the weak-shear steady-state solution procedure with the GLF23 transport model and the scaled experimental edge. The results are also presented in the operation space for DT neutron power versus stationary burn pulse duration with assumed poloidal flux availability at the beginning of stationary burn, indicating that the long pulse operation goal (3000s) at I(p) = 9 MA is possible. Source calculations in these simulations have been revised for electron cyclotron current drive including parallel momentum conservation effects and for neutral beam current drive with finite orbit and magnetic pitch effects.« less
Methodes iteratives paralleles: Applications en neutronique et en mecanique des fluides

NASA Astrophysics Data System (ADS)

Qaddouri, Abdessamad

Dans cette these, le calcul parallele est applique successivement a la neutronique et a la mecanique des fluides. Dans chacune de ces deux applications, des methodes iteratives sont utilisees pour resoudre le systeme d'equations algebriques resultant de la discretisation des equations du probleme physique. Dans le probleme de neutronique, le calcul des matrices des probabilites de collision (PC) ainsi qu'un schema iteratif multigroupe utilisant une methode inverse de puissance sont parallelises. Dans le probleme de mecanique des fluides, un code d'elements finis utilisant un algorithme iteratif du type GMRES preconditionne est parallelise. Cette these est presentee sous forme de six articles suivis d'une conclusion. Les cinq premiers articles traitent des applications en neutronique, articles qui representent l'evolution de notre travail dans ce domaine. Cette evolution passe par un calcul parallele des matrices des PC et un algorithme multigroupe parallele teste sur un probleme unidimensionnel (article 1), puis par deux algorithmes paralleles l'un mutiregion l'autre multigroupe, testes sur des problemes bidimensionnels (articles 2--3). Ces deux premieres etapes sont suivies par l'application de deux techniques d'acceleration, le rebalancement neutronique et la minimisation du residu aux deux algorithmes paralleles (article 4). Finalement, on a mis en oeuvre l'algorithme multigroupe et le calcul parallele des matrices des PC sur un code de production DRAGON ou les tests sont plus realistes et peuvent etre tridimensionnels (article 5). Le sixieme article (article 6), consacre a l'application a la mecanique des fluides, traite la parallelisation d'un code d'elements finis FES ou le partitionneur de graphe METIS et la librairie PSPARSLIB sont utilises.
Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization

DTIC Science & Technology

2014-05-01

exact one is solved later — as- signed as step 5 of Algorithm 2 — because at each iteration , the ADMM updates the variables in the Gauss - Seidel ...Jacobi ADMM (see Algo- rithm 5 below). Unlike the Gauss - Seidel ADMM, the Jacobi ADMM updates all the 70 blocks in parallel at every iteration : xk+1i...that extending ADMM straightforwardly from the classic Gauss - Seidel setting to the Jacobi setting, from two blocks to multiple blocks, will preserve
Inversion of potential field data using the finite element method on parallel computers

NASA Astrophysics Data System (ADS)

Gross, L.; Altinay, C.; Shaw, S.

2015-11-01

In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.
Scalable splitting algorithms for big-data interferometric imaging in the SKA era

NASA Astrophysics Data System (ADS)

Onose, Alexandru; Carrillo, Rafael E.; Repetti, Audrey; McEwen, Jason D.; Thiran, Jean-Philippe; Pesquet, Jean-Christophe; Wiaux, Yves

2016-11-01

In the context of next-generation radio telescopes, like the Square Kilometre Array (SKA), the efficient processing of large-scale data sets is extremely important. Convex optimization tasks under the compressive sensing framework have recently emerged and provide both enhanced image reconstruction quality and scalability to increasingly larger data sets. We focus herein mainly on scalability and propose two new convex optimization algorithmic structures able to solve the convex optimization tasks arising in radio-interferometric imaging. They rely on proximal splitting and forward-backward iterations and can be seen, by analogy, with the CLEAN major-minor cycle, as running sophisticated CLEAN-like iterations in parallel in multiple data, prior, and image spaces. Both methods support any convex regularization function, in particular, the well-studied ℓ1 priors promoting image sparsity in an adequate domain. Tailored for big-data, they employ parallel and distributed computations to achieve scalability, in terms of memory and computational requirements. One of them also exploits randomization, over data blocks at each iteration, offering further flexibility. We present simulation results showing the feasibility of the proposed methods as well as their advantages compared to state-of-the-art algorithmic solvers. Our MATLAB code is available online on GitHub.
Upwind relaxation methods for the Navier-Stokes equations using inner iterations

NASA Technical Reports Server (NTRS)

Taylor, Arthur C., III; Ng, Wing-Fai; Walters, Robert W.

1992-01-01

A subsonic and a supersonic problem are respectively treated by an upwind line-relaxation algorithm for the Navier-Stokes equations using inner iterations to accelerate steady-state solution convergence and thereby minimize CPU time. While the ability of the inner iterative procedure to mimic the quadratic convergence of the direct solver method is attested to in both test problems, some of the nonquadratic inner iterative results are noted to have been more efficient than the quadratic. In the more successful, supersonic test case, inner iteration required only about 65 percent of the line-relaxation method-entailed CPU time.
Performance evaluation of canny edge detection on a tiled multicore architecture

NASA Astrophysics Data System (ADS)

Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

2011-01-01

In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Spotting the difference in molecular dynamics simulations of biomolecules

NASA Astrophysics Data System (ADS)

Sakuraba, Shun; Kono, Hidetoshi

2016-08-01

Comparing two trajectories from molecular simulations conducted under different conditions is not a trivial task. In this study, we apply a method called Linear Discriminant Analysis with ITERative procedure (LDA-ITER) to compare two molecular simulation results by finding the appropriate projection vectors. Because LDA-ITER attempts to determine a projection such that the projections of the two trajectories do not overlap, the comparison does not suffer from a strong anisotropy, which is an issue in protein dynamics. LDA-ITER is applied to two test cases: the T4 lysozyme protein simulation with or without a point mutation and the allosteric protein PDZ2 domain of hPTP1E with or without a ligand. The projection determined by the method agrees with the experimental data and previous simulations. The proposed procedure, which complements existing methods, is a versatile analytical method that is specialized to find the "difference" between two trajectories.
Self-consistent hybrid functionals for solids: a fully-automated implementation

NASA Astrophysics Data System (ADS)

Erba, A.

2017-08-01

A fully-automated algorithm for the determination of the system-specific optimal fraction of exact exchange in self-consistent hybrid functionals of the density-functional-theory is illustrated, as implemented into the public Crystal program. The exchange fraction of this new class of functionals is self-consistently updated proportionally to the inverse of the dielectric response of the system within an iterative procedure (Skone et al 2014 Phys. Rev. B 89, 195112). Each iteration of the present scheme, in turn, implies convergence of a self-consistent-field (SCF) and a coupled-perturbed-Hartree-Fock/Kohn-Sham (CPHF/KS) procedure. The present implementation, beside improving the user-friendliness of self-consistent hybrids, exploits the unperturbed and electric-field perturbed density matrices from previous iterations as guesses for subsequent SCF and CPHF/KS iterations, which is documented to reduce the overall computational cost of the whole process by a factor of 2.
Further investigation on "A multiplicative regularization for force reconstruction"

NASA Astrophysics Data System (ADS)

Aucejo, M.; De Smet, O.

2018-05-01

We have recently proposed a multiplicative regularization to reconstruct mechanical forces acting on a structure from vibration measurements. This method does not require any selection procedure for choosing the regularization parameter, since the amount of regularization is automatically adjusted throughout an iterative resolution process. The proposed iterative algorithm has been developed with performance and efficiency in mind, but it is actually a simplified version of a full iterative procedure not described in the original paper. The present paper aims at introducing the full resolution algorithm and comparing it with its simplified version in terms of computational efficiency and solution accuracy. In particular, it is shown that both algorithms lead to very similar identified solutions.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy.

PubMed

Zelyak, O; Fallone, B G; St-Aubin, J

2017-12-14

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy

NASA Astrophysics Data System (ADS)

Zelyak, O.; Fallone, B. G.; St-Aubin, J.

2018-01-01

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Corrigendum to "Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy".

PubMed

Zelyak, Oleksandr; Fallone, B Gino; St-Aubin, Joel

2018-03-12

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation. © 2018 Institute of Physics and Engineering in Medicine.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Iterated unscented Kalman filter for phase unwrapping of interferometric fringes.

PubMed

Xie, Xianming

2016-08-22

A fresh phase unwrapping algorithm based on iterated unscented Kalman filter is proposed to estimate unambiguous unwrapped phase of interferometric fringes. This method is the result of combining an iterated unscented Kalman filter with a robust phase gradient estimator based on amended matrix pencil model, and an efficient quality-guided strategy based on heap sort. The iterated unscented Kalman filter that is one of the most robust methods under the Bayesian theorem frame in non-linear signal processing so far, is applied to perform simultaneously noise suppression and phase unwrapping of interferometric fringes for the first time, which can simplify the complexity and the difficulty of pre-filtering procedure followed by phase unwrapping procedure, and even can remove the pre-filtering procedure. The robust phase gradient estimator is used to efficiently and accurately obtain phase gradient information from interferometric fringes, which is needed for the iterated unscented Kalman filtering phase unwrapping model. The efficient quality-guided strategy is able to ensure that the proposed method fast unwraps wrapped pixels along the path from the high-quality area to the low-quality area of wrapped phase images, which can greatly improve the efficiency of phase unwrapping. Results obtained from synthetic data and real data show that the proposed method can obtain better solutions with an acceptable time consumption, with respect to some of the most used algorithms.
Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

DTIC Science & Technology

1988-01-21

p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based
PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

PubMed

Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

2018-02-08

The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
SPIRiT: Iterative Self-consistent Parallel Imaging Reconstruction from Arbitrary k-Space

PubMed Central

Lustig, Michael; Pauly, John M.

2010-01-01

A new approach to autocalibrating, coil-by-coil parallel imaging reconstruction is presented. It is a generalized reconstruction framework based on self consistency. The reconstruction problem is formulated as an optimization that yields the most consistent solution with the calibration and acquisition data. The approach is general and can accurately reconstruct images from arbitrary k-space sampling patterns. The formulation can flexibly incorporate additional image priors such as off-resonance correction and regularization terms that appear in compressed sensing. Several iterative strategies to solve the posed reconstruction problem in both image and k-space domain are presented. These are based on a projection over convex sets (POCS) and a conjugate gradient (CG) algorithms. Phantom and in-vivo studies demonstrate efficient reconstructions from undersampled Cartesian and spiral trajectories. Reconstructions that include off-resonance correction and nonlinear ℓ1-wavelet regularization are also demonstrated. PMID:20665790

Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

NASA Astrophysics Data System (ADS)

Yang, Xiang; Mittal, Rajat

2017-03-01

In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.
Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.

PubMed

Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul

2009-02-15

The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.
A stopping criterion to halt iterations at the Richardson-Lucy deconvolution of radiographic images

NASA Astrophysics Data System (ADS)

Almeida, G. L.; Silvani, M. I.; Souza, E. S.; Lopes, R. T.

2015-07-01

Radiographic images, as any experimentally acquired ones, are affected by spoiling agents which degrade their final quality. The degradation caused by agents of systematic character, can be reduced by some kind of treatment such as an iterative deconvolution. This approach requires two parameters, namely the system resolution and the best number of iterations in order to achieve the best final image. This work proposes a novel procedure to estimate the best number of iterations, which replaces the cumbersome visual inspection by a comparison of numbers. These numbers are deduced from the image histograms, taking into account the global difference G between them for two subsequent iterations. The developed algorithm, including a Richardson-Lucy deconvolution procedure has been embodied into a Fortran program capable to plot the 1st derivative of G as the processing progresses and to stop it automatically when this derivative - within the data dispersion - reaches zero. The radiograph of a specially chosen object acquired with thermal neutrons from the Argonauta research reactor at Institutode Engenharia Nuclear - CNEN, Rio de Janeiro, Brazil, have undergone this treatment with fair results.
Convergence of an iterative procedure for large-scale static analysis of structural components

NASA Technical Reports Server (NTRS)

Austin, F.; Ojalvo, I. U.

1976-01-01

The paper proves convergence of an iterative procedure for calculating the deflections of built-up component structures which can be represented as consisting of a dominant, relatively stiff primary structure and a less stiff secondary structure, which may be composed of one or more substructures that are not connected to one another but are all connected to the primary structure. The iteration consists in estimating the deformation of the primary structure in the absence of the secondary structure on the assumption that all mechanical loads are applied directly to the primary structure. The j-th iterate primary structure deflections at the interface are imposed on the secondary structure, and the boundary loads required to produce these deflections are computed. The cycle is completed by applying the interface reaction to the primary structure and computing its updated deflections. It is shown that the mathematical condition for convergence of this procedure is that the maximum eigenvalue of the equation relating primary-structure deflection to imposed secondary-structure deflection be less than unity, which is shown to correspond with the physical requirement that the secondary structure be more flexible at the interface boundary.
Research at ITER towards DEMO: Specific reactor diagnostic studies to be carried out on ITER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krasilnikov, A. V.; Kaschuck, Y. A.; Vershkov, V. A.

2014-08-21

In ITER diagnostics will operate in the very hard radiation environment of fusion reactor. Extensive technology studies are carried out during development of the ITER diagnostics and procedures of their calibration and remote handling. Results of these studies and practical application of the developed diagnostics on ITER will provide the direct input to DEMO diagnostic development. The list of DEMO measurement requirements and diagnostics will be determined during ITER experiments on the bases of ITER plasma physics results and success of particular diagnostic application in reactor-like ITER plasma. Majority of ITER diagnostic already passed the conceptual design phase and representmore » the state of the art in fusion plasma diagnostic development. The number of related to DEMO results of ITER diagnostic studies such as design and prototype manufacture of: neutron and γ–ray diagnostics, neutral particle analyzers, optical spectroscopy including first mirror protection and cleaning technics, reflectometry, refractometry, tritium retention measurements etc. are discussed.« less
Research at ITER towards DEMO: Specific reactor diagnostic studies to be carried out on ITER

NASA Astrophysics Data System (ADS)

Krasilnikov, A. V.; Kaschuck, Y. A.; Vershkov, V. A.; Petrov, A. A.; Petrov, V. G.; Tugarinov, S. N.

2014-08-01

In ITER diagnostics will operate in the very hard radiation environment of fusion reactor. Extensive technology studies are carried out during development of the ITER diagnostics and procedures of their calibration and remote handling. Results of these studies and practical application of the developed diagnostics on ITER will provide the direct input to DEMO diagnostic development. The list of DEMO measurement requirements and diagnostics will be determined during ITER experiments on the bases of ITER plasma physics results and success of particular diagnostic application in reactor-like ITER plasma. Majority of ITER diagnostic already passed the conceptual design phase and represent the state of the art in fusion plasma diagnostic development. The number of related to DEMO results of ITER diagnostic studies such as design and prototype manufacture of: neutron and γ-ray diagnostics, neutral particle analyzers, optical spectroscopy including first mirror protection and cleaning technics, reflectometry, refractometry, tritium retention measurements etc. are discussed.
Unsupervised iterative detection of land mines in highly cluttered environments.

PubMed

Batman, Sinan; Goutsias, John

2003-01-01

An unsupervised iterative scheme is proposed for land mine detection in heavily cluttered scenes. This scheme is based on iterating hybrid multispectral filters that consist of a decorrelating linear transform coupled with a nonlinear morphological detector. Detections extracted from the first pass are used to improve results in subsequent iterations. The procedure stops after a predetermined number of iterations. The proposed scheme addresses several weaknesses associated with previous adaptations of morphological approaches to land mine detection. Improvement in detection performance, robustness with respect to clutter inhomogeneities, a completely unsupervised operation, and computational efficiency are the main highlights of the method. Experimental results reveal excellent performance.
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

NASA Astrophysics Data System (ADS)

Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

2018-03-01

This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Multiscale optical simulation settings: challenging applications handled with an iterative ray-tracing FDTD interface method.

PubMed

Leiner, Claude; Nemitz, Wolfgang; Schweitzer, Susanne; Kuna, Ladislav; Wenzl, Franz P; Hartmann, Paul; Satzinger, Valentin; Sommer, Christian

2016-03-20

We show that with an appropriate combination of two optical simulation techniques-classical ray-tracing and the finite difference time domain method-an optical device containing multiple diffractive and refractive optical elements can be accurately simulated in an iterative simulation approach. We compare the simulation results with experimental measurements of the device to discuss the applicability and accuracy of our iterative simulation procedure.
Parallel AFSA algorithm accelerating based on MIC architecture

NASA Astrophysics Data System (ADS)

Zhou, Junhao; Xiao, Hong; Huang, Yifan; Li, Yongzhao; Xu, Yuanrui

2017-05-01

Analysis AFSA past for solving the traveling salesman problem, the algorithm efficiency is often a big problem, and the algorithm processing method, it does not fully responsive to the characteristics of the traveling salesman problem to deal with, and therefore proposes a parallel join improved AFSA process. The simulation with the current TSP known optimal solutions were analyzed, the results showed that the AFSA iterations improved less, on the MIC cards doubled operating efficiency, efficiency significantly.
Air Force Maui Optical and Supercomputing Site (AMOS) Application Briefs 2004

DTIC Science & Technology

2004-01-01

Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection...17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 60 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT...SOR methods, the work in parallel matrix iterative methods is very relevant to the parallelization of a CA. The Moore neighborhood used in this work
Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
An iterative technique to stabilize a linear time invariant multivariable system with output feedback

NASA Technical Reports Server (NTRS)

Sankaran, V.

1974-01-01

An iterative procedure for determining the constant gain matrix that will stabilize a linear constant multivariable system using output feedback is described. The use of this procedure avoids the transformation of variables which is required in other procedures. For the case in which the product of the output and input vector dimensions is greater than the number of states of the plant, general solution is given. In the case in which the states exceed the product of input and output vector dimensions, a least square solution which may not be stable in all cases is presented. The results are illustrated with examples.
A parallel variable metric optimization algorithm

NASA Technical Reports Server (NTRS)

Straeter, T. A.

1973-01-01

An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

NASA Astrophysics Data System (ADS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
FPGA implementation of low complexity LDPC iterative decoder

NASA Astrophysics Data System (ADS)

Verma, Shivani; Sharma, Sanjay

2016-07-01

Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.
Drifts, currents, and power scrape-off width in SOLPS-ITER modeling of DIII-D

DOE PAGES

Meier, E. T.; Goldston, R. J.; Kaveeva, E. G.; ...

2016-12-27

The effects of drifts and associated flows and currents on the width of the parallel heat flux channel (λ q) in the tokamak scrape-off layer (SOL) are analyzed using the SOLPS-ITER 2D fluid transport code. Motivation is supplied by Goldston’s heuristic drift (HD) model for λ q, which yields the same approximately inverse poloidal magnetic field dependence seen in multi-machine regression. The analysis, focusing on a DIII-D H-mode discharge, reveals HD-like features, including comparable density and temperature fall-off lengths in the SOL, and up-down ion pressure asymmetry that allows net cross-separatrix ion magnetic drift flux to exceed net anomalous ionmore » flux. In experimentally relevant high-recycling cases, scans of both toroidal and poloidal magnetic field (B tor and B pol) are conducted, showing minimal λ q dependence on either component of the field. Insensitivity to B tor is expected, and suggests that SOLPS-ITER is effectively capturing some aspects of HD physics. Absence of λ q dependence on B pol, however, is inconsistent with both the HD model and experimental results. As a result, the inconsistency is attributed to strong variation in the parallel Mach number, which violates one of the premises of the HD model.« less
Detection of mouse liver cancer via a parallel iterative shrinkage method in hybrid optical/microcomputed tomography imaging

NASA Astrophysics Data System (ADS)

Wu, Ping; Liu, Kai; Zhang, Qian; Xue, Zhenwen; Li, Yongbao; Ning, Nannan; Yang, Xin; Li, Xingde; Tian, Jie

2012-12-01

Liver cancer is one of the most common malignant tumors worldwide. In order to enable the noninvasive detection of small liver tumors in mice, we present a parallel iterative shrinkage (PIS) algorithm for dual-modality tomography. It takes advantage of microcomputed tomography and multiview bioluminescence imaging, providing anatomical structure and bioluminescence intensity information to reconstruct the size and location of tumors. By incorporating prior knowledge of signal sparsity, we associate some mathematical strategies including specific smooth convex approximation, an iterative shrinkage operator, and affine subspace with the PIS method, which guarantees the accuracy, efficiency, and reliability for three-dimensional reconstruction. Then an in vivo experiment on the bead-implanted mouse has been performed to validate the feasibility of this method. The findings indicate that a tiny lesion less than 3 mm in diameter can be localized with a position bias no more than 1 mm the computational efficiency is one to three orders of magnitude faster than the existing algorithms; this approach is robust to the different regularization parameters and the lp norms. Finally, we have applied this algorithm to another in vivo experiment on an HCCLM3 orthotopic xenograft mouse model, which suggests the PIS method holds the promise for practical applications of whole-body cancer detection.

Non-iterative distance constraints enforcement for cloth drapes simulation

NASA Astrophysics Data System (ADS)

Hidajat, R. L. L. G.; Wibowo, Arifin, Z.; Suyitno

2016-03-01

A cloth simulation represents the behavior of cloth objects such as flag, tablecloth, or even garments has application in clothing animation for games and virtual shops. Elastically deformable models have widely used to provide realistic and efficient simulation, however problem of overstretching is encountered. We introduce a new cloth simulation algorithm that replaces iterative distance constraint enforcement steps with non-iterative ones for preventing over stretching in a spring-mass system for cloth modeling. Our method is based on a simple position correction procedure applied at one end of a spring. In our experiments, we developed a rectangle cloth model which is initially at a horizontal position with one point is fixed, and it is allowed to drape by its own weight. Our simulation is able to achieve a plausible cloth drapes as in reality. This paper aims to demonstrate the reliability of our approach to overcome overstretches while decreasing the computational cost of the constraint enforcement process due to an iterative procedure that is eliminated.
Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moryakov, A. V., E-mail: sailor@orc.ru

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Evaluation of noise limits to improve image processing in soft X-ray projection microscopy.

PubMed

Jamsranjav, Erdenetogtokh; Kuge, Kenichi; Ito, Atsushi; Kinjo, Yasuhito; Shiina, Tatsuo

2017-03-03

Soft X-ray microscopy has been developed for high resolution imaging of hydrated biological specimens due to the availability of water window region. In particular, a projection type microscopy has advantages in wide viewing area, easy zooming function and easy extensibility to computed tomography (CT). The blur of projection image due to the Fresnel diffraction of X-rays, which eventually reduces spatial resolution, could be corrected by an iteration procedure, i.e., repetition of Fresnel and inverse Fresnel transformations. However, it was found that the correction is not enough to be effective for all images, especially for images with low contrast. In order to improve the effectiveness of image correction by computer processing, we in this study evaluated the influence of background noise in the iteration procedure through a simulation study. In the study, images of model specimen with known morphology were used as a substitute for the chromosome images, one of the targets of our microscope. Under the condition that artificial noise was distributed on the images randomly, we introduced two different parameters to evaluate noise effects according to each situation where the iteration procedure was not successful, and proposed an upper limit of the noise within which the effective iteration procedure for the chromosome images was possible. The study indicated that applying the new simulation and noise evaluation method was useful for image processing where background noises cannot be ignored compared with specimen images.
Chebyshev polynomial filtered subspace iteration in the discontinuous Galerkin method for large-scale electronic structure calculations

DOE PAGES

Banerjee, Amartya S.; Lin, Lin; Hu, Wei; ...

2016-10-21

The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) canmore » be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALB functions requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale twodimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. In conclusion, employing 55 296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8586 atoms is 90 s, and the time for a graphene sheet containing 11 520 atoms is 75 s.« less
Acceleration of linear stationary iterative processes in multiprocessor computers. II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romm, Ya.E.

1982-05-01

For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less
Reconstruction of coded aperture images

NASA Technical Reports Server (NTRS)

Bielefeld, Michael J.; Yin, Lo I.

1987-01-01

Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Research in Computational Aeroscience Applications Implemented on Advanced Parallel Computing Systems

NASA Technical Reports Server (NTRS)

Wigton, Larry

1996-01-01

Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
Solving Differential Equations Using Modified Picard Iteration

ERIC Educational Resources Information Center

Robin, W. A.

2010-01-01

Many classes of differential equations are shown to be open to solution through a method involving a combination of a direct integration approach with suitably modified Picard iterative procedures. The classes of differential equations considered include typical initial value, boundary value and eigenvalue problems arising in physics and…
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Correction of phase velocity bias caused by strong directional noise sources in high-frequency ambient noise tomography: a case study in Karamay, China

NASA Astrophysics Data System (ADS)

Wang, K.; Luo, Y.; Yang, Y.

2016-12-01

We collect two months of ambient noise data recorded by 35 broadband seismic stations in a 9×11 km area near Karamay, China, and do cross-correlation of noise data between all station pairs. Array beamforming analysis of the ambient noise data shows that ambient noise sources are unevenly distributed and the most energetic ambient noise mainly comes from azimuths of 40o-70o. As a consequence of the strong directional noise sources, surface wave waveforms of the cross-correlations at 1-5 Hz show clearly azimuthal dependence, and direct dispersion measurements from cross-correlations are strongly biased by the dominant noise energy. This bias renders that the dispersion measurements from cross-correlations do not accurately reflect the interstation velocities of surface waves propagating directly from one station to the other, that is, the cross-correlation functions do not retrieve Empirical Green's Functions accurately. To correct the bias caused by unevenly distributed noise sources, we adopt an iterative inversion procedure. The iterative inversion procedure, based on plane-wave modeling, includes three steps: (1) surface wave tomography, (2) estimation of ambient noise energy and (3) phase velocities correction. First, we use synthesized data to test efficiency and stability of the iterative procedure for both homogeneous and heterogeneous media. The testing results show that: (1) the amplitudes of phase velocity bias caused by directional noise sources are significant, reaching 2% and 10% for homogeneous and heterogeneous media, respectively; (2) phase velocity bias can be corrected by the iterative inversion procedure and the convergences of inversion depend on the starting phase velocity map and the complexity of the media. By applying the iterative approach to the real data in Karamay, we further show that phase velocity maps converge after ten iterations and the phase velocity map based on corrected interstation dispersion measurements are more consistent with results from geology surveys than those based on uncorrected ones. As ambient noise in high frequency band (>1Hz) is mostly related to human activities or climate events, both of which have strong directivity, the iterative approach demonstrated here helps improve the accuracy and resolution of ANT in imaging shallow earth structures.
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

DOE PAGES

Bunting, Gregory; Prakash, Arun; Walsh, Timothy; ...

2018-01-26

Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bunting, Gregory; Prakash, Arun; Walsh, Timothy

Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Multilevel acceleration of scattering-source iterations with application to electron transport

DOE PAGES

Drumm, Clif; Fan, Wesley

2017-08-18

Acceleration/preconditioning strategies available in the SCEPTRE radiation transport code are described. A flexible transport synthetic acceleration (TSA) algorithm that uses a low-order discrete-ordinates (S N) or spherical-harmonics (P N) solve to accelerate convergence of a high-order S N source-iteration (SI) solve is described. Convergence of the low-order solves can be further accelerated by applying off-the-shelf incomplete-factorization or algebraic-multigrid methods. Also available is an algorithm that uses a generalized minimum residual (GMRES) iterative method rather than SI for convergence, using a parallel sweep-based solver to build up a Krylov subspace. TSA has been applied as a preconditioner to accelerate the convergencemore » of the GMRES iterations. The methods are applied to several problems involving electron transport and problems with artificial cross sections with large scattering ratios. These methods were compared and evaluated by considering material discontinuities and scattering anisotropy. Observed accelerations obtained are highly problem dependent, but speedup factors around 10 have been observed in typical applications.« less
Iterative Importance Sampling Algorithms for Parameter Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
Establishing Factor Validity Using Variable Reduction in Confirmatory Factor Analysis.

ERIC Educational Resources Information Center

Hofmann, Rich

1995-01-01

Using a 21-statement attitude-type instrument, an iterative procedure for improving confirmatory model fit is demonstrated within the context of the EQS program of P. M. Bentler and maximum likelihood factor analysis. Each iteration systematically eliminates the poorest fitting statement as identified by a variable fit index. (SLD)
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.

NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods

PubMed Central

Smith, David S.; Gore, John C.; Yankeelov, Thomas E.; Welch, E. Brian

2012-01-01

Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images. PMID:22481908
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods.

PubMed

Smith, David S; Gore, John C; Yankeelov, Thomas E; Welch, E Brian

2012-01-01

Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 4096(2) or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 1024(2) and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images.

On the solution of evolution equations based on multigrid and explicit iterative methods

NASA Astrophysics Data System (ADS)

Zhukov, V. T.; Novikova, N. D.; Feodoritova, O. B.

2015-08-01

Two schemes for solving initial-boundary value problems for three-dimensional parabolic equations are studied. One is implicit and is solved using the multigrid method, while the other is explicit iterative and is based on optimal properties of the Chebyshev polynomials. In the explicit iterative scheme, the number of iteration steps and the iteration parameters are chosen as based on the approximation and stability conditions, rather than on the optimization of iteration convergence to the solution of the implicit scheme. The features of the multigrid scheme include the implementation of the intergrid transfer operators for the case of discontinuous coefficients in the equation and the adaptation of the smoothing procedure to the spectrum of the difference operators. The results produced by these schemes as applied to model problems with anisotropic discontinuous coefficients are compared.
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
Segmentation of remotely sensed data using parallel region growing

NASA Technical Reports Server (NTRS)

Tilton, J. C.; Cox, S. C.

1983-01-01

The improved spatial resolution of the new earth resources satellites will increase the need for effective utilization of spatial information in machine processing of remotely sensed data. One promising technique is scene segmentation by region growing. Region growing can use spatial information in two ways: only spatially adjacent regions merge together, and merging criteria can be based on region-wide spatial features. A simple region growing approach is described in which the similarity criterion is based on region mean and variance (a simple spatial feature). An effective way to implement region growing for remote sensing is as an iterative parallel process on a large parallel processor. A straightforward parallel pixel-based implementation of the algorithm is explored and its efficiency is compared with sequential pixel-based, sequential region-based, and parallel region-based implementations. Experimental results from on aircraft scanner data set are presented, as is a discussioon of proposed improvements to the segmentation algorithm.
User's Guide for ENSAERO_FE Parallel Finite Element Solver

NASA Technical Reports Server (NTRS)

Eldred, Lloyd B.; Guruswamy, Guru P.

1999-01-01

A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
Non-Cartesian Parallel Imaging Reconstruction

PubMed Central

Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

2014-01-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
Vortex breakdown simulation

NASA Technical Reports Server (NTRS)

Hafez, M.; Ahmad, J.; Kuruvila, G.; Salas, M. D.

1987-01-01

In this paper, steady, axisymmetric inviscid, and viscous (laminar) swirling flows representing vortex breakdown phenomena are simulated using a stream function-vorticity-circulation formulation and two numerical methods. The first is based on an inverse iteration, where a norm of the solution is prescribed and the swirling parameter is calculated as a part of the output. The second is based on direct Newton iterations, where the linearized equations, for all the unknowns, are solved simultaneously by an efficient banded Gaussian elimination procedure. Several numerical solutions for inviscid and viscous flows are demonstrated, followed by a discussion of the results. Some improvements on previous work have been achieved: first order upwind differences are replaced by second order schemes, line relaxation procedure (with linear convergence rate) is replaced by Newton's iterations (which converge quadratically), and Reynolds numbers are extended from 200 up to 1000.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

2017-01-28

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Walker, H. F.

1978-01-01

This paper addresses the problem of obtaining numerically maximum-likelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successive-approximations procedure, based on the likelihood equations, was shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, we introduce a general iterative procedure, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. We show that, with probability 1 as the sample size grows large, this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. We also show that the step-size which yields optimal local convergence rates for large samples is determined in a sense by the 'separation' of the component normal densities and is bounded below by a number between 1 and 2.
An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions, 2

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Walker, H. F.

1976-01-01

The problem of obtaining numerically maximum likelihood estimates of the parameters for a mixture of normal distributions is addressed. In recent literature, a certain successive approximations procedure, based on the likelihood equations, is shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, a general iterative procedure is introduced, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. With probability 1 as the sample size grows large, it is shown that this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. The step-size which yields optimal local convergence rates for large samples is determined in a sense by the separation of the component normal densities and is bounded below by a number between 1 and 2.
Virtual fringe projection system with nonparallel illumination based on iteration

NASA Astrophysics Data System (ADS)

Zhou, Duo; Wang, Zhangying; Gao, Nan; Zhang, Zonghua; Jiang, Xiangqian

2017-06-01

Fringe projection profilometry has been widely applied in many fields. To set up an ideal measuring system, a virtual fringe projection technique has been studied to assist in the design of hardware configurations. However, existing virtual fringe projection systems use parallel illumination and have a fixed optical framework. This paper presents a virtual fringe projection system with nonparallel illumination. Using an iterative method to calculate intersection points between rays and reference planes or object surfaces, the proposed system can simulate projected fringe patterns and captured images. A new explicit calibration method has been presented to validate the precision of the system. Simulated results indicate that the proposed iterative method outperforms previous systems. Our virtual system can be applied to error analysis, algorithm optimization, and help operators to find ideal system parameter settings for actual measurements.
Variable aperture-based ptychographical iterative engine method

NASA Astrophysics Data System (ADS)

Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

2018-02-01

A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches.
A comparative study of serial and parallel aeroelastic computations of wings

NASA Technical Reports Server (NTRS)

Byun, Chansup; Guruswamy, Guru P.

1994-01-01

A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

NASA Astrophysics Data System (ADS)

Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

2011-03-01

A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
State Transition Matrix for Perturbed Orbital Motion Using Modified Chebyshev Picard Iteration

NASA Astrophysics Data System (ADS)

Read, Julie L.; Younes, Ahmad Bani; Macomber, Brent; Turner, James; Junkins, John L.

2015-06-01

The Modified Chebyshev Picard Iteration (MCPI) method has recently proven to be highly efficient for a given accuracy compared to several commonly adopted numerical integration methods, as a means to solve for perturbed orbital motion. This method utilizes Picard iteration, which generates a sequence of path approximations, and Chebyshev Polynomials, which are orthogonal and also enable both efficient and accurate function approximation. The nodes consistent with discrete Chebyshev orthogonality are generated using cosine sampling; this strategy also reduces the Runge effect and as a consequence of orthogonality, there is no matrix inversion required to find the basis function coefficients. The MCPI algorithms considered herein are parallel-structured so that they are immediately well-suited for massively parallel implementation with additional speedup. MCPI has a wide range of applications beyond ephemeris propagation, including the propagation of the State Transition Matrix (STM) for perturbed two-body motion. A solution is achieved for a spherical harmonic series representation of earth gravity (EGM2008), although the methodology is suitable for application to any gravity model. Included in this representation the normalized, Associated Legendre Functions are given and verified numerically. Modifications of the classical algorithm techniques, such as rewriting the STM equations in a second-order cascade formulation, gives rise to additional speedup. Timing results for the baseline formulation and this second-order formulation are given.
Optimization and validation of accelerated golden-angle radial sparse MRI reconstruction with self-calibrating GRAPPA operator gridding.

PubMed

Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li

2018-07-01

Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Analysis of surface wave propagation in a grounded dielectric slab covered by a resistive sheet

NASA Technical Reports Server (NTRS)

Shively, David G.

1992-01-01

Both parallel and perpendicular polarized surface waves are known to propagate on lossless and lossy grounded dielectric slabs. Surface wave propagation on a grounded dielectric slab covered with a resistive sheet is considered. Both parallel and perpendicular polarizations are examined. Transcendental equations are derived for each polarization and are solved using iterative techniques. Attenuation and phase velocity are shown for representative geometries. The results are applicable to both a grounded slab with a resistive sheet and an ungrounded slab covered on each side with a resistive sheet.
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saad, Yousef

2014-01-16

The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

PubMed Central

Jia, Erik; Chen, Tianlu

2018-01-01

Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.

2014-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

NASA Technical Reports Server (NTRS)

Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

1996-01-01

We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

PubMed

O'Connor, B P

2000-08-01

Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
On Nonequivalence of Several Procedures of Structural Equation Modeling

ERIC Educational Resources Information Center

Yuan, Ke-Hai; Chan, Wai

2005-01-01

The normal theory based maximum likelihood procedure is widely used in structural equation modeling. Three alternatives are: the normal theory based generalized least squares, the normal theory based iteratively reweighted least squares, and the asymptotically distribution-free procedure. When data are normally distributed and the model structure…
Comparing Instructional Strategies for Integrating Conceptual and Procedural Knowledge.

ERIC Educational Resources Information Center

Rittle-Johnson, Bethany; Koedinger, Kenneth R.

We compared alternative instructional strategies for integrating knowledge of decimal place value and regrouping concepts with procedures for adding and subtracting decimals. The first condition was based on recent research suggesting that conceptual and procedural knowledge develop in an iterative, hand over hand fashion. In this iterative…
Prefixation of Simplex Pairs in Czech: An Analysis of Spatial Semantics, Distributive Verbs, and Procedural Meanings

ERIC Educational Resources Information Center

Hilchey, Christian Thomas

2014-01-01

This dissertation examines prefixation of simplex pairs. A simplex pair consists of an iterative imperfective and a semelfactive perfective verb. When prefixed, both of these verbs are perfective. The prefixed forms derived from semelfactives are labeled single act verbs, while the prefixed forms derived from iterative imperfective simplex verbs…
Parallel transmission pulse design with explicit control for the specific absorption rate in the presence of radiofrequency errors.

PubMed

Martin, Adrian; Schiavi, Emanuele; Eryaman, Yigitcan; Herraiz, Joaquin L; Gagoski, Borjan; Adalsteinsson, Elfar; Wald, Lawrence L; Guerin, Bastien

2016-06-01

A new framework for the design of parallel transmit (pTx) pulses is presented introducing constraints for local and global specific absorption rate (SAR) in the presence of errors in the radiofrequency (RF) transmit chain. The first step is the design of a pTx RF pulse with explicit constraints for global and local SAR. Then, the worst possible SAR associated with that pulse due to RF transmission errors ("worst-case SAR") is calculated. Finally, this information is used to re-calculate the pulse with lower SAR constraints, iterating this procedure until its worst-case SAR is within safety limits. Analysis of an actual pTx RF transmit chain revealed amplitude errors as high as 8% (20%) and phase errors above 3° (15°) for spokes (spiral) pulses. Simulations show that using the proposed framework, pulses can be designed with controlled "worst-case SAR" in the presence of errors of this magnitude at minor cost of the excitation profile quality. Our worst-case SAR-constrained pTx design strategy yields pulses with local and global SAR within the safety limits even in the presence of RF transmission errors. This strategy is a natural way to incorporate SAR safety factors in the design of pTx pulses. Magn Reson Med 75:2493-2504, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Variable aperture-based ptychographical iterative engine method.

PubMed

Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

2018-02-01

A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE PAGES

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

2015-12-01

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses

NASA Astrophysics Data System (ADS)

Novak, Roman; Vencelj, Matja¿

2009-12-01

The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.
Iterated reaction graphs: simulating complex Maillard reaction pathways.

PubMed

Patel, S; Rabone, J; Russell, S; Tissen, J; Klaffke, W

2001-01-01

This study investigates a new method of simulating a complex chemical system including feedback loops and parallel reactions. The practical purpose of this approach is to model the actual reactions that take place in the Maillard process, a set of food browning reactions, in sufficient detail to be able to predict the volatile composition of the Maillard products. The developed framework, called iterated reaction graphs, consists of two main elements: a soup of molecules and a reaction base of Maillard reactions. An iterative process loops through the reaction base, taking reactants from and feeding products back to the soup. This produces a reaction graph, with molecules as nodes and reactions as arcs. The iterated reaction graph is updated and validated by comparing output with the main products found by classical gas-chromatographic/mass spectrometric analysis. To ensure a realistic output and convergence to desired volatiles only, the approach contains a number of novel elements: rate kinetics are treated as reaction probabilities; only a subset of the true chemistry is modeled; and the reactions are blocked into groups.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Parallelization of a blind deconvolution algorithm

NASA Astrophysics Data System (ADS)

Matson, Charles L.; Borelli, Kathy J.

2006-09-01

Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.
Numerical solution of quadratic matrix equations for free vibration analysis of structures

NASA Technical Reports Server (NTRS)

Gupta, K. K.

1975-01-01

This paper is concerned with the efficient and accurate solution of the eigenvalue problem represented by quadratic matrix equations. Such matrix forms are obtained in connection with the free vibration analysis of structures, discretized by finite 'dynamic' elements, resulting in frequency-dependent stiffness and inertia matrices. The paper presents a new numerical solution procedure of the quadratic matrix equations, based on a combined Sturm sequence and inverse iteration technique enabling economical and accurate determination of a few required eigenvalues and associated vectors. An alternative procedure based on a simultaneous iteration procedure is also described when only the first few modes are the usual requirement. The employment of finite dynamic elements in conjunction with the presently developed eigenvalue routines results in a most significant economy in the dynamic analysis of structures.
Optimal Iterative Task Scheduling for Parallel Simulations.

DTIC Science & Technology

1991-03-01

State University, Pullman, Washington. November 1976. 19. Grimaldi , Ralph P . Discrete and Combinatorial Mathematics. Addison-Wesley. June 1989. 20...2 4.8.1 Problem Description .. .. .. .. ... .. ... .... 4-25 4.8.2 Reasons for Level-Strate- p Failure. .. .. .. .. ... 4-26...f- I CA A* overview................................ C-1 C .2 Sample A* r......................... .... C-I C-3 Evaluation P
Non-iterative Voltage Stability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Makarov, Yuri V.; Vyakaranam, Bharat; Hou, Zhangshuan

2014-09-30

This report demonstrates promising capabilities and performance characteristics of the proposed method using several power systems models. The new method will help to develop a new generation of highly efficient tools suitable for real-time parallel implementation. The ultimate benefit obtained will be early detection of system instability and prevention of system blackouts in real time.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai, Xiao-Chuan; Keyes, David; Yang, Chao

2014-09-29

The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less

Turbo Trellis Coded Modulation With Iterative Decoding for Mobile Satellite Communications

NASA Technical Reports Server (NTRS)

Divsalar, D.; Pollara, F.

1997-01-01

In this paper, analytical bounds on the performance of parallel concatenation of two codes, known as turbo codes, and serial concatenation of two codes over fading channels are obtained. Based on this analysis, design criteria for the selection of component trellis codes for MPSK modulation, and a suitable bit-by-bit iterative decoding structure are proposed. Examples are given for throughput of 2 bits/sec/Hz with 8PSK modulation. The parallel concatenation example uses two rate 4/5 8-state convolutional codes with two interleavers. The convolutional codes' outputs are then mapped to two 8PSK modulations. The serial concatenated code example uses an 8-state outer code with rate 4/5 and a 4-state inner trellis code with 5 inputs and 2 x 8PSK outputs per trellis branch. Based on the above mentioned design criteria for fading channels, a method to obtain he structure of the trellis code with maximum diversity is proposed. Simulation results are given for AWGN and an independent Rayleigh fading channel with perfect Channel State Information (CSI).
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.

PubMed

Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D

2002-08-07

Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.
The novel implicit LU-SGS parallel iterative method based on the diffusion equation of a nuclear reactor on a GPU cluster

NASA Astrophysics Data System (ADS)

Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya

2017-02-01

GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functionalmore » characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.« less
A new approach for solving the three-dimensional steady Euler equations. I - General theory

NASA Technical Reports Server (NTRS)

Chang, S.-C.; Adamczyk, J. J.

1986-01-01

The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
A new approach for solving the three-dimensional steady Euler equations. I - General theory

NASA Astrophysics Data System (ADS)

Chang, S.-C.; Adamczyk, J. J.

1986-08-01

The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
Iterative procedures for space shuttle main engine performance models

NASA Technical Reports Server (NTRS)

Santi, L. Michael

1989-01-01

Performance models of the Space Shuttle Main Engine (SSME) contain iterative strategies for determining approximate solutions to nonlinear equations reflecting fundamental mass, energy, and pressure balances within engine flow systems. Both univariate and multivariate Newton-Raphson algorithms are employed in the current version of the engine Test Information Program (TIP). Computational efficiency and reliability of these procedures is examined. A modified trust region form of the multivariate Newton-Raphson method is implemented and shown to be superior for off nominal engine performance predictions. A heuristic form of Broyden's Rank One method is also tested and favorable results based on this algorithm are presented.
Non-iterative volumetric particle reconstruction near moving bodies

NASA Astrophysics Data System (ADS)

Mendelson, Leah; Techet, Alexandra

2017-11-01

When multi-camera 3D PIV experiments are performed around a moving body, the body often obscures visibility of regions of interest in the flow field in a subset of cameras. We evaluate the performance of non-iterative particle reconstruction algorithms used for synthetic aperture PIV (SAPIV) in these partially-occluded regions. We show that when partial occlusions are present, the quality and availability of 3D tracer particle information depends on the number of cameras and reconstruction procedure used. Based on these findings, we introduce an improved non-iterative reconstruction routine for SAPIV around bodies. The reconstruction procedure combines binary masks, already required for reconstruction of the body's 3D visual hull, and a minimum line-of-sight algorithm. This approach accounts for partial occlusions without performing separate processing for each possible subset of cameras. We combine this reconstruction procedure with three-dimensional imaging on both sides of the free surface to reveal multi-fin wake interactions generated by a jumping archer fish. Sufficient particle reconstruction in near-body regions is crucial to resolving the wake structures of upstream fins (i.e., dorsal and anal fins) before and during interactions with the caudal tail.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU.

PubMed

Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan

2013-01-01

This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis.
3D streamers simulation in a pin to plane configuration using massively parallel computing

NASA Astrophysics Data System (ADS)

Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.

2018-03-01

This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU

PubMed Central

Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan

2013-01-01

This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis. PMID:23840507
Evaluation of the effects of patient arm attenuation in SPECT cardiac perfusion imaging

NASA Astrophysics Data System (ADS)

Luo, Dershan; King, M. A.; Pan, Tin-Su; Xia, Weishi

1996-12-01

It was hypothesized that the use of attenuation correction could compensate for degradation in the uniformity of apparent localization of imaging agents seen in cardiac walls when patients are imaged with arms at their sides. Noise-free simulations of the digital MCAT phantom were employed to investigate this hypothesis. Four variations in camera size and collimation scheme were investigated. We observed that: 1) without attenuation correction, the arms had little additional influences on the uniformity of the heart for 180/spl deg/ reconstructions and caused a small increase in nonuniformity for 360/spl deg/ reconstructions, where the impact of both arms was included; 2) change in patient size had more of an impact on count uniformity than the presence of the arms, either with or without attenuation correction; 3) for a low number of iterations and large patient size, slightly better uniformity was obtained from parallel emission data than from fan-beam emission data, independent of whether parallel or fan-beam transmission data was used to reconstruct the attenuation maps; and 4) for all camera configurations, uniformity was improved with attenuation correction and, given sufficient number of iterations, it was compatible among different imaging geometry combinations. Thus, iterative algorithms can compensate for the additional attenuation imposed by larger patients or having the arms on the sides. When the arms are at the sides of the patient, however, a larger radius of rotation may be required, resulting in decreased spatial resolution.
Evaluation of the effects of patient arm attenuation in SPECT cardiac perfusion imaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, D.; King, M.A.; Pan, T.S.

1996-12-01

It was hypothesized that the use of attenuation correction could compensate for degradation in the uniformity of apparent localization of imaging agents seen in cardiac walls when patients are imaged with arms at their sides. Noise-free simulations of the digital MCAT phantom were employed to investigate this hypothesis. Four variations in camera size and collimation scheme were investigated. The authors observed that: (1) without attenuation correction, the arms had little additional influences on the uniformity of the heart for 180{degree} reconstructions and caused a small increase in nonuniformity for 360{degree} reconstructions, where the impact of both arms was included; (2)more » change in patient size had more of an impact on count uniformity than the presence of the arms, either with or without attenuation correction; (3) for a low number of iterations and large patient size, slightly better uniformity was obtained from parallel emission data than from fan-beam emission data, independent of whether parallel or fan-beam transmission data was used to reconstruct the attenuation maps; and (4) for all camera configurations, uniformity was improved with attenuation correction and, given sufficient number of iterations, it was compatible among different imaging geometry combinations. Thus, iterative algorithms can compensate for the additional attenuation imposed by larger patients or having the arms on the sides. When the arms are at the sides of the patient, however, a larger radius of rotation may be required, resulting in decreased spatial resolution.« less
Improving the iterative Linear Interaction Energy approach using automated recognition of configurational transitions.

PubMed

Vosmeer, C Ruben; Kooi, Derk P; Capoferri, Luigi; Terpstra, Margreet M; Vermeulen, Nico P E; Geerke, Daan P

2016-01-01

Recently an iterative method was proposed to enhance the accuracy and efficiency of ligand-protein binding affinity prediction through linear interaction energy (LIE) theory. For ligand binding to flexible Cytochrome P450s (CYPs), this method was shown to decrease the root-mean-square error and standard deviation of error prediction by combining interaction energies of simulations starting from different conformations. Thereby, different parts of protein-ligand conformational space are sampled in parallel simulations. The iterative LIE framework relies on the assumption that separate simulations explore different local parts of phase space, and do not show transitions to other parts of configurational space that are already covered in parallel simulations. In this work, a method is proposed to (automatically) detect such transitions during the simulations that are performed to construct LIE models and to predict binding affinities. Using noise-canceling techniques and splines to fit time series of the raw data for the interaction energies, transitions during simulation between different parts of phase space are identified. Boolean selection criteria are then applied to determine which parts of the interaction energy trajectories are to be used as input for the LIE calculations. Here we show that this filtering approach benefits the predictive quality of our previous CYP 2D6-aryloxypropanolamine LIE model. In addition, an analysis is performed of the gain in computational efficiency that can be obtained from monitoring simulations using the proposed filtering method and by prematurely terminating simulations accordingly.
DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Signal detection theory and vestibular perception: III. Estimating unbiased fit parameters for psychometric functions.

PubMed

Chaudhuri, Shomesh E; Merfeld, Daniel M

2013-03-01

Psychophysics generally relies on estimating a subject's ability to perform a specific task as a function of an observed stimulus. For threshold studies, the fitted functions are called psychometric functions. While fitting psychometric functions to data acquired using adaptive sampling procedures (e.g., "staircase" procedures), investigators have encountered a bias in the spread ("slope" or "threshold") parameter that has been attributed to the serial dependency of the adaptive data. Using simulations, we confirm this bias for cumulative Gaussian parametric maximum likelihood fits on data collected via adaptive sampling procedures, and then present a bias-reduced maximum likelihood fit that substantially reduces the bias without reducing the precision of the spread parameter estimate and without reducing the accuracy or precision of the other fit parameters. As a separate topic, we explain how to implement this bias reduction technique using generalized linear model fits as well as other numeric maximum likelihood techniques such as the Nelder-Mead simplex. We then provide a comparison of the iterative bootstrap and observed information matrix techniques for estimating parameter fit variance from adaptive sampling procedure data sets. The iterative bootstrap technique is shown to be slightly more accurate; however, the observed information technique executes in a small fraction (0.005 %) of the time required by the iterative bootstrap technique, which is an advantage when a real-time estimate of parameter fit variance is required.
Bidirectional iterative parcellation of diffusion weighted imaging data: Separating cortical regions connected by the arcuate fasciculus and extreme capsule

PubMed Central

Patterson, Dianne K.; Van Petten, Cyma; Beeson, Pélagie M.; Rapcsak, Steven Z.; Plante, Elena

2014-01-01

This paper introduces a Bidirectional Iterative Parcellation (BIP) procedure designed to identify the location and size of connected cortical regions (parcellations) at both ends of a white matter tract in diffusion weighted images. The procedure applies the FSL option “probabilistic tracking with classification targets” in a bidirectional and iterative manner. To assess the utility of BIP, we applied the procedure to the problem of parcellating a limited set of well-established gray matter seed regions associated with the dorsal (arcuate fasciculus/superior longitudinal fasciculus) and ventral (extreme capsule fiber system) white matter tracts in the language networks of 97 participants. These left hemisphere seed regions and the two white matter tracts, along with their right hemisphere homologues, provided an excellent test case for BIP because the resulting parcellations overlap and their connectivity via the arcuate fasciculi and extreme capsule fiber systems are well studied. The procedure yielded both confirmatory and novel findings. Specifically, BIP confirmed that each tract connects within the seed regions in unique, but expected ways. Novel findings included increasingly left-lateralized parcellations associated with the arcuate fasciculus/superior longitudinal fasciculus as a function of age and education. These results demonstrate that BIP is an easily implemented technique that successfully confirmed cortical connectivity patterns predicted in the literature, and has the potential to provide new insights regarding the architecture of the brain. PMID:25173414
Railway track geometry degradation due to differential settlement of ballast/subgrade - Numerical prediction by an iterative procedure

NASA Astrophysics Data System (ADS)

Nielsen, Jens C. O.; Li, Xin

2018-01-01

An iterative procedure for numerical prediction of long-term degradation of railway track geometry (longitudinal level) due to accumulated differential settlement of ballast/subgrade is presented. The procedure is based on a time-domain model of dynamic vehicle-track interaction to calculate the contact loads between sleepers and ballast in the short-term, which are then used in an empirical model to determine the settlement of ballast/subgrade below each sleeper in the long-term. The number of load cycles (wheel passages) accounted for in each iteration step is determined by an adaptive step length given by a maximum settlement increment. To reduce the computational effort for the simulations of dynamic vehicle-track interaction, complex-valued modal synthesis with a truncated modal set is applied for the linear subset of the discretely supported track model with non-proportional spatial distribution of viscous damping. Gravity loads and state-dependent vehicle, track and wheel-rail contact conditions are accounted for as external loads on the modal model, including situations involving loss of (and recovered) wheel-rail contact, impact between hanging sleeper and ballast, and/or a prescribed variation of non-linear track support stiffness properties along the track model. The procedure is demonstrated by calculating the degradation of longitudinal level over time as initiated by a prescribed initial local rail irregularity (dipped welded rail joint).
A methodology for finding the optimal iteration number of the SIRT algorithm for quantitative Electron Tomography.

PubMed

Okariz, Ana; Guraya, Teresa; Iturrondobeitia, Maider; Ibarretxe, Julen

2017-02-01

The SIRT (Simultaneous Iterative Reconstruction Technique) algorithm is commonly used in Electron Tomography to calculate the original volume of the sample from noisy images, but the results provided by this iterative procedure are strongly dependent on the specific implementation of the algorithm, as well as on the number of iterations employed for the reconstruction. In this work, a methodology for selecting the iteration number of the SIRT reconstruction that provides the most accurate segmentation is proposed. The methodology is based on the statistical analysis of the intensity profiles at the edge of the objects in the reconstructed volume. A phantom which resembles a a carbon black aggregate has been created to validate the methodology and the SIRT implementations of two free software packages (TOMOJ and TOMO3D) have been used. Copyright © 2016 Elsevier B.V. All rights reserved.

Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Iterative method of construction of a bifurcation diagram of autorotation motions for a system with one degree of freedom

NASA Astrophysics Data System (ADS)

Klimina, L. A.

2018-05-01

The modification of the Picard approach is suggested that is targeted to the construction of a bifurcation diagram of 2π -periodic motions of mechanical system with a cylindrical phase space. Each iterative step is based on principles of averaging and energy balance similar to the Poincare-Pontryagin approach. If the iterative procedure converges, it provides the periodic trajectory of the system depending on the bifurcation parameter of the model. The method is applied to describe self-sustained rotations in the model of an aerodynamic pendulum.
An iterative method for the localization of a neutron source in a large box (container)

NASA Astrophysics Data System (ADS)

Dubinski, S.; Presler, O.; Alfassi, Z. B.

2007-12-01

The localization of an unknown neutron source in a bulky box was studied. This can be used for the inspection of cargo, to prevent the smuggling of neutron and α emitters. It is important to localize the source from the outside for safety reasons. Source localization is necessary in order to determine its activity. A previous study showed that, by using six detectors, three on each parallel face of the box (460×420×200 mm 3), the location of the source can be found with an average distance of 4.73 cm between the real source position and the calculated one and a maximal distance of about 9 cm. Accuracy was improved in this work by applying an iteration method based on four fixed detectors and the successive iteration of positioning of an external calibrating source. The initial positioning of the calibrating source is the plane of detectors 1 and 2. This method finds the unknown source location with an average distance of 0.78 cm between the real source position and the calculated one and a maximum distance of 3.66 cm for the same box. For larger boxes, localization without iterations requires an increase in the number of detectors, while localization with iterations requires only an increase in the number of iteration steps. In addition to source localization, two methods for determining the activity of the unknown source were also studied.
Advances in Global Adjoint Tomography -- Massive Data Assimilation

NASA Astrophysics Data System (ADS)

Ruan, Y.; Lei, W.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Krischer, L.; Tromp, J.

2015-12-01

Azimuthal anisotropy and anelasticity are key to understanding a myriad of processes in Earth's interior. Resolving these properties requires accurate simulations of seismic wave propagation in complex 3-D Earth models and an iterative inversion strategy. In the wake of successes in regional studies(e.g., Chen et al., 2007; Tape et al., 2009, 2010; Fichtner et al., 2009, 2010; Chen et al.,2010; Zhu et al., 2012, 2013; Chen et al., 2015), we are employing adjoint tomography based on a spectral-element method (Komatitsch & Tromp 1999, 2002) on a global scale using the supercomputer ''Titan'' at Oak Ridge National Laboratory. After 15 iterations, we have obtained a high-resolution transversely isotropic Earth model (M15) using traveltime data from 253 earthquakes. To obtain higher resolution images of the emerging new features and to prepare the inversion for azimuthal anisotropy and anelasticity, we expanded the original dataset with approximately 4,220 additional global earthquakes (Mw5.5-7.0) --occurring between 1995 and 2014-- and downloaded 300-minute-long time series for all available data archived at the IRIS Data Management Center, ORFEUS, and F-net. Ocean Bottom Seismograph data from the last decade are also included to maximize data coverage. In order to handle the huge dataset and solve the I/O bottleneck in global adjoint tomography, we implemented a python-based parallel data processing workflow based on the newly developed Adaptable Seismic Data Format (ASDF). With the help of the data selection tool MUSTANG developed by IRIS, we cleaned our dataset and assembled event-based ASDF files for parallel processing. We have started Centroid Moment Tensors (CMT) inversions for all 4,220 earthquakes with the latest model M15, and selected high-quality data for measurement. We will statistically investigate each channel using synthetic seismograms calculated in M15 for updated CMTs and identify problematic channels. In addition to data screening, we also modified the conventional multi-taper method to obtain better frequency-dependent measurements of surface-wave phase and amplitude anomalies, and therefore more accurate adjoint sources, which are particularly important for anelastic tomography. We present a summary of these data culling and processing procedures for global adjoint tomography.
Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.

1990-01-01

The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Domain decomposition methods in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Gropp, William D.; Keyes, David E.

1991-01-01

The divide-and-conquer paradigm of iterative domain decomposition, or substructuring, has become a practical tool in computational fluid dynamic applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. These features are illustrated on the classic model problem of flow over a backstep using Newton's method as the nonlinear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately, and they can be combined synergistically. Sample performance results are included from an Intel iPSC/860 hypercube implementation.
Gas Flows in Rocket Motors. Volume 2. Appendix C. Time Iterative Solution of Viscous Supersonic Flow

DTIC Science & Technology

1989-08-01

by b!ock number) FIELD GROUP SUB- GROUP nozzle analysis, Navier-Stokes, turbulent flow, equilibrium S 20 04 chemistry 19. ABSTRACT (Continue on reverse... quasi -conservative formulations lead to unacrepilably large mass conservation errors. Along with the investigations of Navier-Stkes algorithins...Characteristics Splitting ................................... 125 4.2.3 Non -Iterative PNS Procedure ............................... 125 4.2.4 Comparisons of
Parallel approach for bioinspired algorithms

NASA Astrophysics Data System (ADS)

Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina

2018-05-01

In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
ITER Central Solenoid Module Fabrication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, John

The fabrication of the modules for the ITER Central Solenoid (CS) has started in a dedicated production facility located in Poway, California, USA. The necessary tools have been designed, built, installed, and tested in the facility to enable the start of production. The current schedule has first module fabrication completed in 2017, followed by testing and subsequent shipment to ITER. The Central Solenoid is a key component of the ITER tokamak providing the inductive voltage to initiate and sustain the plasma current and to position and shape the plasma. The design of the CS has been a collaborative effort betweenmore » the US ITER Project Office (US ITER), the international ITER Organization (IO) and General Atomics (GA). GA’s responsibility includes: completing the fabrication design, developing and qualifying the fabrication processes and tools, and then completing the fabrication of the seven 110 tonne CS modules. The modules will be shipped separately to the ITER site, and then stacked and aligned in the Assembly Hall prior to insertion in the core of the ITER tokamak. A dedicated facility in Poway, California, USA has been established by GA to complete the fabrication of the seven modules. Infrastructure improvements included thick reinforced concrete floors, a diesel generator for backup power, along with, cranes for moving the tooling within the facility. The fabrication process for a single module requires approximately 22 months followed by five months of testing, which includes preliminary electrical testing followed by high current (48.5 kA) tests at 4.7K. The production of the seven modules is completed in a parallel fashion through ten process stations. The process stations have been designed and built with most stations having completed testing and qualification for carrying out the required fabrication processes. The final qualification step for each process station is achieved by the successful production of a prototype coil. Fabrication of the first ITER module is in progress. The seven modules will be individually shipped to Cadarache, France upon their completion. This paper describes the processes and status of the fabrication of the CS Modules for ITER.« less
Acoustic scattering by arbitrary distributions of disjoint, homogeneous cylinders or spheres.

PubMed

Hesford, Andrew J; Astheimer, Jeffrey P; Waag, Robert C

2010-05-01

A T-matrix formulation is presented to compute acoustic scattering from arbitrary, disjoint distributions of cylinders or spheres, each with arbitrary, uniform acoustic properties. The generalized approach exploits the similarities in these scattering problems to present a single system of equations that is easily specialized to cylindrical or spherical scatterers. By employing field expansions based on orthogonal harmonic functions, continuity of pressure and normal particle velocity are directly enforced at each scatterer using diagonal, analytic expressions to eliminate the need for integral equations. The effect of a cylinder or sphere that encloses all other scatterers is simulated with an outer iterative procedure that decouples the inner-object solution from the effect of the enclosing object to improve computational efficiency when interactions among the interior objects are significant. Numerical results establish the validity and efficiency of the outer iteration procedure for nested objects. Two- and three-dimensional methods that employ this outer iteration are used to measure and characterize the accuracy of two-dimensional approximations to three-dimensional scattering of elevation-focused beams.
Parallel Symmetric Eigenvalue Problem Solvers

DTIC Science & Technology

2015-05-01

get research, tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF...32 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 37 4.7 Relationship between...pencil. I will conclude with a discussion of the relationship between Trace- Min and simultaneous iteration. If both methods solve the linear systems
Tokamak und Stellarator - zwei Wege zur Fusionsenergie: Fusionsforschung

NASA Astrophysics Data System (ADS)

Milch, Isabella

2006-07-01

Im Laufe der Fusionsforschung haben sich zwei Bautypen für ein zukünftiges Kraftwerk als besonders aussichtsreich erwiesen: Tokamak und Stellarator. Mit dem geplanten Tokamak-Experimentalreaktor ITER steht die internationale Fusionsforschung vor der Demonstration eines Energie liefernden Plasmas. Parallel soll die in Greifswald entstehende Forschungsanlage Wendelstein 7-X die Kraftwerkstauglichkeit des alternativen Bauprinzips der Stellaratoren zeigen.
A fast iterative scheme for the linearized Boltzmann equation

NASA Astrophysics Data System (ADS)

Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.

2017-06-01

Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference between these results and those using the hard-sphere potential is discussed.
Multi-GPU Acceleration of Branchless Distance Driven Projection and Backprojection for Clinical Helical CT.

PubMed

Mitra, Ayan; Politte, David G; Whiting, Bruce R; Williamson, Jeffrey F; O'Sullivan, Joseph A

2017-01-01

Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.
Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix

NASA Astrophysics Data System (ADS)

Gan, Chee Kwan; Challacombe, Matt

2003-05-01

Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
Application of Four-Point Newton-EGSOR iteration for the numerical solution of 2D Porous Medium Equations

NASA Astrophysics Data System (ADS)

Chew, J. V. L.; Sulaiman, J.

2017-09-01

Partial differential equations that are used in describing the nonlinear heat and mass transfer phenomena are difficult to be solved. For the case where the exact solution is difficult to be obtained, it is necessary to use a numerical procedure such as the finite difference method to solve a particular partial differential equation. In term of numerical procedure, a particular method can be considered as an efficient method if the method can give an approximate solution within the specified error with the least computational complexity. Throughout this paper, the two-dimensional Porous Medium Equation (2D PME) is discretized by using the implicit finite difference scheme to construct the corresponding approximation equation. Then this approximation equation yields a large-sized and sparse nonlinear system. By using the Newton method to linearize the nonlinear system, this paper deals with the application of the Four-Point Newton-EGSOR (4NEGSOR) iterative method for solving the 2D PMEs. In addition to that, the efficiency of the 4NEGSOR iterative method is studied by solving three examples of the problems. Based on the comparative analysis, the Newton-Gauss-Seidel (NGS) and the Newton-SOR (NSOR) iterative methods are also considered. The numerical findings show that the 4NEGSOR method is superior to the NGS and the NSOR methods in terms of the number of iterations to get the converged solutions, the time of computation and the maximum absolute errors produced by the methods.
Evaluation of an Airborne Spacing Concept, On-Board Spacing Tool, and Pilot Interface

NASA Technical Reports Server (NTRS)

Swieringa, Kurt; Murdoch, Jennifer L.; Baxley, Brian; Hubbs, Clay

2011-01-01

The number of commercial aircraft operations is predicted to increase in the next ten years, creating a need for improved operational efficiency. Two areas believed to offer significant increases in efficiency are optimized profile descents and dependent parallel runway operations. It is envisioned that during both of these types of operations, flight crews will precisely space their aircraft behind preceding aircraft at air traffic control assigned intervals to increase runway throughput and maximize the use of existing infrastructure. This paper describes a human-in-the-loop experiment designed to study the performance of an onboard spacing algorithm and pilots ratings of the usability and acceptability of an airborne spacing concept that supports dependent parallel arrivals. Pilot participants flew arrivals into the Dallas Fort-Worth terminal environment using one of three different simulators located at the National Aeronautics and Space Administration s (NASA) Langley Research Center. Scenarios were flown using Interval Management with Spacing (IM-S) and Required Time of Arrival (RTA) control methods during conditions of no error, error in the forecast wind, and offset (disturbance) to the arrival flow. Results indicate that pilots delivered their aircraft to the runway threshold within +/- 3.5 seconds of their assigned arrival time and reported that both the IM-S and RTA procedures were associated with low workload levels. In general, pilots found the IM-S concept, procedures, speeds, and interface acceptable; with 92% of pilots rating the procedures as complete and logical, 218 out of 240 responses agreeing that the IM-S speeds were acceptable, and 63% of pilots reporting that the displays were easy to understand and displayed in appropriate locations. The 22 (out of 240) responses, indicating that the commanded speeds were not acceptable and appropriate occurred during scenarios containing wind error and offset error. Concerns cited included the occurrence of multiple speed changes within a short time period, speed changes required within twenty miles of the runway, and an increase in airspeed followed shortly by a decrease in airspeed. Within this paper, appropriate design recommendations are provided, and the need for continued, iterative human-centered design is discussed.
Enhancement of event related potentials by iterative restoration algorithms

NASA Astrophysics Data System (ADS)

Pomalaza-Raez, Carlos A.; McGillem, Clare D.

1986-12-01

An iterative procedure for the restoration of event related potentials (ERP) is proposed and implemented. The method makes use of assumed or measured statistical information about latency variations in the individual ERP components. The signal model used for the restoration algorithm consists of a time-varying linear distortion and a positivity/negativity constraint. Additional preprocessing in the form of low-pass filtering is needed in order to mitigate the effects of additive noise. Numerical results obtained with real data show clearly the presence of enhanced and regenerated components in the restored ERP's. The procedure is easy to implement which makes it convenient when compared to other proposed techniques for the restoration of ERP signals.
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

NASA Astrophysics Data System (ADS)

Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia

2013-08-01

Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.

The numerical evaluation of maximum-likelihood estimates of the parameters for a mixture of normal distributions from partially identified samples

NASA Technical Reports Server (NTRS)

Walker, H. F.

1976-01-01

Likelihood equations determined by the two types of samples which are necessary conditions for a maximum-likelihood estimate were considered. These equations suggest certain successive approximations iterative procedures for obtaining maximum likelihood estimates. The procedures, which are generalized steepest ascent (deflected gradient) procedures, contain those of Hosmer as a special case.
[Urologic surgical procedures in patients with uterus neoplasm and colon-rectal cancer].

PubMed

Marino, G; Laudi, M; Capussotti, L; Zola, P

2008-01-01

INTRODUCTION. During the last 30 years, the multidisciplinary treatments of colon and uterus neoplasm have yielded an increase in total survival rates, fostering therefore the increase of cases with regional relapse involving the urinary tract. In these cases the iterative surgery can be performed, if no disease secondary to pelvic pain, haemostatic or debulking procedure is present, and must be considered and discussed with the patient, according to his/her general status. MATERIALS AND METHODS. From 1997 to August 2007 we performed altogether 43 pelvic iterative surgeries, with simultaneous urologic surgical procedure because of pelvic tumor relapse in patients with uterus neoplasm and colon and rectal cancer. In 4 cases of anal cancer, the urological procedure were: one radical prostatectomy with continent vesicostomy in the first case, while in the other 3 cases radical pelvectomy with double-barrelled uretero-cutaneostomy. In 23 cases of colon cancer, the urologic procedures were: 9 cases of radical cystectomy with double-barrelled uretero-cutaneostomy, 4 cases of radical cystectomy with uretero-ileo-cutaneostomy according to Bricker- Wallace II procedure, and 9 cases of partial cystectomy with pelvic ureterectomy and ureterocystoneostomy according to Lich-Gregoire technique (7 cases) and Lembo-Boari (2 cases) procedure. In 16 cases of uterus cancer, the urological procedure were: 7 cases of partial cystectomy with pelvic ureterectomy and uretero-cystoneostomy according to Lich-Gregoire procedure; in 3 cases, a radical cystectomy with urinary continent cutaneous diversion according to the Ileal T-pouch procedure; 2 cases of total pelvectomy and double uretero-cutaneostomy, and 4 cases of bilateral uretero-cutaneostomy. RESULTS. No patients died in the perioperative time; early systemic complications were: 2 esophageal candidiasis, 1 case of venous thrombosis. CONCLUSIONS. The iterative pelvic surgery in the case of oncological relapse involving the urinary tract aims to achieve the best quality of life with the utmost oncological radicality. The equation: eradication of pelvic neoplasm and urinary tract reconstruction, with acceptable quality of life, will be the future target; nevertheless, it is not possible to establish guidelines beforehand, and the therapy must be adapted to each single case.
Flexible Method for Developing Tactics, Techniques, and Procedures for Future Capabilities

DTIC Science & Technology

2009-02-01

levels of ability, military experience, and motivation, (b) number and type of significant events, and (c) other sources of natural variability...research has developed a number of specific instruments designed to aid in this process. Second, the iterative, feed-forward nature of the method allows...FLEX method), but still lack the structured KE approach and iterative, feed-forward nature of the FLEX method. To facilitate decision making
Improving cluster-based missing value estimation of DNA microarray data.

PubMed

Brás, Lígia P; Menezes, José C

2007-06-01

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.
Investigation of a Parabolic Iterative Solver for Three-dimensional Configurations

NASA Technical Reports Server (NTRS)

Nark, Douglas M.; Watson, Willie R.; Mani, Ramani

2007-01-01

A parabolic iterative solution procedure is investigated that seeks to extend the parabolic approximation used within the internal propagation module of the duct noise propagation and radiation code CDUCT-LaRC. The governing convected Helmholtz equation is split into a set of coupled equations governing propagation in the positive and negative directions. The proposed method utilizes an iterative procedure to solve the coupled equations in an attempt to account for possible reflections from internal bifurcations, impedance discontinuities, and duct terminations. A geometry consistent with the NASA Langley Curved Duct Test Rig is considered and the effects of acoustic treatment and non-anechoic termination are included. Two numerical implementations are studied and preliminary results indicate that improved accuracy in predicted amplitude and phase can be obtained for modes at a cut-off ratio of 1.7. Further predictions for modes at a cut-off ratio of 1.1 show improvement in predicted phase at the expense of increased amplitude error. Possible methods of improvement are suggested based on analytic and numerical analysis. It is hoped that coupling the parabolic iterative approach with less efficient, high fidelity finite element approaches will ultimately provide the capability to perform efficient, higher fidelity acoustic calculations within complex 3-D geometries for impedance eduction and noise propagation and radiation predictions.
Tuning without over-tuning: parametric uncertainty quantification for the NEMO ocean model

NASA Astrophysics Data System (ADS)

Williamson, Daniel B.; Blaker, Adam T.; Sinha, Bablu

2017-04-01

In this paper we discuss climate model tuning and present an iterative automatic tuning method from the statistical science literature. The method, which we refer to here as iterative refocussing (though also known as history matching), avoids many of the common pitfalls of automatic tuning procedures that are based on optimisation of a cost function, principally the over-tuning of a climate model due to using only partial observations. This avoidance comes by seeking to rule out parameter choices that we are confident could not reproduce the observations, rather than seeking the model that is closest to them (a procedure that risks over-tuning). We comment on the state of climate model tuning and illustrate our approach through three waves of iterative refocussing of the NEMO (Nucleus for European Modelling of the Ocean) ORCA2 global ocean model run at 2° resolution. We show how at certain depths the anomalies of global mean temperature and salinity in a standard configuration of the model exceeds 10 standard deviations away from observations and show the extent to which this can be alleviated by iterative refocussing without compromising model performance spatially. We show how model improvements can be achieved by simultaneously perturbing multiple parameters, and illustrate the potential of using low-resolution ensembles to tune NEMO ORCA configurations at higher resolutions.
Image transmission system using adaptive joint source and channel decoding

NASA Astrophysics Data System (ADS)

Liu, Weiliang; Daut, David G.

2005-03-01

In this paper, an adaptive joint source and channel decoding method is designed to accelerate the convergence of the iterative log-dimain sum-product decoding procedure of LDPC codes as well as to improve the reconstructed image quality. Error resilience modes are used in the JPEG2000 source codec, which makes it possible to provide useful source decoded information to the channel decoder. After each iteration, a tentative decoding is made and the channel decoded bits are then sent to the JPEG2000 decoder. Due to the error resilience modes, some bits are known to be either correct or in error. The positions of these bits are then fed back to the channel decoder. The log-likelihood ratios (LLR) of these bits are then modified by a weighting factor for the next iteration. By observing the statistics of the decoding procedure, the weighting factor is designed as a function of the channel condition. That is, for lower channel SNR, a larger factor is assigned, and vice versa. Results show that the proposed joint decoding methods can greatly reduce the number of iterations, and thereby reduce the decoding delay considerably. At the same time, this method always outperforms the non-source controlled decoding method up to 5dB in terms of PSNR for various reconstructed images.
A fast method to emulate an iterative POCS image reconstruction algorithm.

PubMed

Zeng, Gengsheng L

2017-10-01

Iterative image reconstruction algorithms are commonly used to optimize an objective function, especially when the objective function is nonquadratic. Generally speaking, the iterative algorithms are computationally inefficient. This paper presents a fast algorithm that has one backprojection and no forward projection. This paper derives a new method to solve an optimization problem. The nonquadratic constraint, for example, an edge-preserving denoising constraint is implemented as a nonlinear filter. The algorithm is derived based on the POCS (projections onto projections onto convex sets) approach. A windowed FBP (filtered backprojection) algorithm enforces the data fidelity. An iterative procedure, divided into segments, enforces edge-enhancement denoising. Each segment performs nonlinear filtering. The derived iterative algorithm is computationally efficient. It contains only one backprojection and no forward projection. Low-dose CT data are used for algorithm feasibility studies. The nonlinearity is implemented as an edge-enhancing noise-smoothing filter. The patient studies results demonstrate its effectiveness in processing low-dose x ray CT data. This fast algorithm can be used to replace many iterative algorithms. © 2017 American Association of Physicists in Medicine.
Reducing Design Cycle Time and Cost Through Process Resequencing

NASA Technical Reports Server (NTRS)

Rogers, James L.

2004-01-01

In today's competitive environment, companies are under enormous pressure to reduce the time and cost of their design cycle. One method for reducing both time and cost is to develop an understanding of the flow of the design processes and the effects of the iterative subcycles that are found in complex design projects. Once these aspects are understood, the design manager can make decisions that take advantage of decomposition, concurrent engineering, and parallel processing techniques to reduce the total time and the total cost of the design cycle. One software tool that can aid in this decision-making process is the Design Manager's Aid for Intelligent Decomposition (DeMAID). The DeMAID software minimizes the feedback couplings that create iterative subcycles, groups processes into iterative subcycles, and decomposes the subcycles into a hierarchical structure. The real benefits of producing the best design in the least time and at a minimum cost are obtained from sequencing the processes in the subcycles.
On the inversion of geodetic integrals defined over the sphere using 1-D FFT

NASA Astrophysics Data System (ADS)

García, R. V.; Alejo, C. A.

2005-08-01

An iterative method is presented which performs inversion of integrals defined over the sphere. The method is based on one-dimensional fast Fourier transform (1-D FFT) inversion and is implemented with the projected Landweber technique, which is used to solve constrained least-squares problems reducing the associated 1-D cyclic-convolution error. The results obtained are as precise as the direct matrix inversion approach, but with better computational efficiency. A case study uses the inversion of Hotine’s integral to obtain gravity disturbances from geoid undulations. Numerical convergence is also analyzed and comparisons with respect to the direct matrix inversion method using conjugate gradient (CG) iteration are presented. Like the CG method, the number of iterations needed to get the optimum (i.e., small) error decreases as the measurement noise increases. Nevertheless, for discrete data given over a whole parallel band, the method can be applied directly without implementing the projected Landweber method, since no cyclic convolution error exists.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.

PubMed

Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T

2017-01-01

Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Solutions of large-scale electromagnetics problems involving dielectric objects with the parallel multilevel fast multipole algorithm.

PubMed

Ergül, Özgür

2011-11-01

Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Solution of partial differential equations on vector and parallel computers

NASA Technical Reports Server (NTRS)

Ortega, J. M.; Voigt, R. G.

1985-01-01

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Efficient stabilization and acceleration of numerical simulation of fluid flows by residual recombination

NASA Astrophysics Data System (ADS)

Citro, V.; Luchini, P.; Giannetti, F.; Auteri, F.

2017-09-01

The study of the stability of a dynamical system described by a set of partial differential equations (PDEs) requires the computation of unstable states as the control parameter exceeds its critical threshold. Unfortunately, the discretization of the governing equations, especially for fluid dynamic applications, often leads to very large discrete systems. As a consequence, matrix based methods, like for example the Newton-Raphson algorithm coupled with a direct inversion of the Jacobian matrix, lead to computational costs too large in terms of both memory and execution time. We present a novel iterative algorithm, inspired by Krylov-subspace methods, which is able to compute unstable steady states and/or accelerate the convergence to stable configurations. Our new algorithm is based on the minimization of the residual norm at each iteration step with a projection basis updated at each iteration rather than at periodic restarts like in the classical GMRES method. The algorithm is able to stabilize any dynamical system without increasing the computational time of the original numerical procedure used to solve the governing equations. Moreover, it can be easily inserted into a pre-existing relaxation (integration) procedure with a call to a single black-box subroutine. The procedure is discussed for problems of different sizes, ranging from a small two-dimensional system to a large three-dimensional problem involving the Navier-Stokes equations. We show that the proposed algorithm is able to improve the convergence of existing iterative schemes. In particular, the procedure is applied to the subcritical flow inside a lid-driven cavity. We also discuss the application of Boostconv to compute the unstable steady flow past a fixed circular cylinder (2D) and boundary-layer flow over a hemispherical roughness element (3D) for supercritical values of the Reynolds number. We show that Boostconv can be used effectively with any spatial discretization, be it a finite-difference, finite-volume, finite-element or spectral method.
Scientific and technical challenges on the road towards fusion electricity

NASA Astrophysics Data System (ADS)

Donné, A. J. H.; Federici, G.; Litaudon, X.; McDonald, D. C.

2017-10-01

The goal of the European Fusion Roadmap is to deliver fusion electricity to the grid early in the second half of this century. It breaks the quest for fusion energy into eight missions, and for each of them it describes a research and development programme to address all the open technical gaps in physics and technology and estimates the required resources. It points out the needs to intensify industrial involvement and to seek all opportunities for collaboration outside Europe. The roadmap covers three periods: the short term, which runs parallel to the European Research Framework Programme Horizon 2020, the medium term and the long term. ITER is the key facility of the roadmap as it is expected to achieve most of the important milestones on the path to fusion power. Thus, the vast majority of present resources are dedicated to ITER and its accompanying experiments. The medium term is focussed on taking ITER into operation and bringing it to full power, as well as on preparing the construction of a demonstration power plant DEMO, which will for the first time demonstrate fusion electricity to the grid around the middle of this century. Building and operating DEMO is the subject of the last roadmap phase: the long term. Clearly, the Fusion Roadmap is tightly connected to the ITER schedule. Three key milestones are the first operation of ITER, the start of the DT operation in ITER and reaching the full performance at which the thermal fusion power is 10 times the power put in to the plasma. The Engineering Design Activity of DEMO needs to start a few years after the first ITER plasma, while the start of the construction phase will be a few years after ITER reaches full performance. In this way ITER can give viable input to the design and development of DEMO. Because the neutron fluence in DEMO will be much higher than in ITER, it is important to develop and validate materials that can handle these very high neutron loads. For the testing of the materials, a dedicated 14 MeV neutron source is needed. This DEMO Oriented Neutron Source (DONES) is therefore an important facility to support the fusion roadmap.
Nonperturbative measurement of the local magnetic field using pulsed polarimetry for fusion reactor conditions (invited).

PubMed

Smith, Roger J

2008-10-01

A novel diagnostic technique for the remote and nonperturbative sensing of the local magnetic field in reactor relevant plasmas is presented. Pulsed polarimetry [Patent No. 12/150,169 (pending)] combines optical scattering with the Faraday effect. The polarimetric light detection and ranging (LIDAR)-like diagnostic has the potential to be a local B(pol) diagnostic on ITER and can achieve spatial resolutions of millimeters on high energy density (HED) plasmas using existing lasers. The pulsed polarimetry method is based on nonlocal measurements and subtle effects are introduced that are not present in either cw polarimetry or Thomson scattering LIDAR. Important features include the capability of simultaneously measuring local T(e), n(e), and B(parallel) along the line of sight, a resiliency to refractive effects, a short measurement duration providing near instantaneous data in time, and location for real-time feedback and control of magnetohydrodynamic (MHD) instabilities and the realization of a widely applicable internal magnetic field diagnostic for the magnetic fusion energy program. The technique improves for higher n(e)B(parallel) product and higher n(e) and is well suited for diagnosing the transient plasmas in the HED program. Larger devices such as ITER and DEMO are also better suited to the technique, allowing longer pulse lengths and thereby relaxing key technology constraints making pulsed polarimetry a valuable asset for next step devices. The pulsed polarimetry technique is clarified by way of illustration on the ITER tokamak and plasmas within the magnetized target fusion program within present technological means.
An improved parallel fuzzy connected image segmentation method based on CUDA.

PubMed

Wang, Liansheng; Li, Dong; Huang, Shaohui

2016-05-12

Fuzzy connectedness method (FC) is an effective method for extracting fuzzy objects from medical images. However, when FC is applied to large medical image datasets, its running time will be greatly expensive. Therefore, a parallel CUDA version of FC (CUDA-kFOE) was proposed by Ying et al. to accelerate the original FC. Unfortunately, CUDA-kFOE does not consider the edges between GPU blocks, which causes miscalculation of edge points. In this paper, an improved algorithm is proposed by adding a correction step on the edge points. The improved algorithm can greatly enhance the calculation accuracy. In the improved method, an iterative manner is applied. In the first iteration, the affinity computation strategy is changed and a look up table is employed for memory reduction. In the second iteration, the error voxels because of asynchronism are updated again. Three different CT sequences of hepatic vascular with different sizes were used in the experiments with three different seeds. NVIDIA Tesla C2075 is used to evaluate our improved method over these three data sets. Experimental results show that the improved algorithm can achieve a faster segmentation compared to the CPU version and higher accuracy than CUDA-kFOE. The calculation results were consistent with the CPU version, which demonstrates that it corrects the edge point calculation error of the original CUDA-kFOE. The proposed method has a comparable time cost and has less errors compared to the original CUDA-kFOE as demonstrated in the experimental results. In the future, we will focus on automatic acquisition method and automatic processing.
irGPU.proton.Net: Irregular strong charge interaction networks of protonatable groups in protein molecules--a GPU solver using the fast multipole method and statistical thermodynamics.

PubMed

Kantardjiev, Alexander A

2015-04-05

A cluster of strongly interacting ionization groups in protein molecules with irregular ionization behavior is suggestive for specific structure-function relationship. However, their computational treatment is unconventional (e.g., lack of convergence in naive self-consistent iterative algorithm). The stringent evaluation requires evaluation of Boltzmann averaged statistical mechanics sums and electrostatic energy estimation for each microstate. irGPU: Irregular strong interactions in proteins--a GPU solver is novel solution to a versatile problem in protein biophysics--atypical protonation behavior of coupled groups. The computational severity of the problem is alleviated by parallelization (via GPU kernels) which is applied for the electrostatic interaction evaluation (including explicit electrostatics via the fast multipole method) as well as statistical mechanics sums (partition function) estimation. Special attention is given to the ease of the service and encapsulation of theoretical details without sacrificing rigor of computational procedures. irGPU is not just a solution-in-principle but a promising practical application with potential to entice community into deeper understanding of principles governing biomolecule mechanisms. © 2015 Wiley Periodicals, Inc.
Morphological-transformation-based technique of edge detection and skeletonization of an image using a single spatial light modulator

NASA Astrophysics Data System (ADS)

Munshi, Soumika; Datta, A. K.

2003-03-01

A technique of optically detecting the edge and skeleton of an image by defining shift operations for morphological transformation is described. A (2 × 2) source array, which acts as the structuring element of morphological operations, casts four angularly shifted optical projections of the input image. The resulting dilated image, when superimposed with the complementary input image, produces the edge image. For skeletonization, the source array casts four partially overlapped output images of the inverted input image, which is negated, and the resultant image is recorded in a CCD camera. This overlapped eroded image is again eroded and then dilated, producing an opened image. The difference between the eroded and opened image is then computed, resulting in a thinner image. This procedure of obtaining a thinned image is iterated until the difference image becomes zero, maintaining the connectivity conditions. The technique has been optically implemented using a single spatial modulator and has the advantage of single-instruction parallel processing of the image. The techniques have been tested both for binary and grey images.
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.

Computer program MCAP-TOSS calculates steady-state fluid dynamics of coolant in parallel channels and temperature distribution in surrounding heat-generating solid

NASA Technical Reports Server (NTRS)

Lee, A. Y.

1967-01-01

Computer program calculates the steady state fluid distribution, temperature rise, and pressure drop of a coolant, the material temperature distribution of a heat generating solid, and the heat flux distributions at the fluid-solid interfaces. It performs the necessary iterations automatically within the computer, in one machine run.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Hixon, Duane; Sankar, L. N.

1993-01-01

During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
A COMPARISON OF TRANSIENT INFINITE ELEMENTS AND TRANSIENT KIRCHHOFF INTEGRAL METHODS FOR FAR FIELD ACOUSTIC ANALYSIS

DOE PAGES

WALSH, TIMOTHY F.; JONES, ANDREA; BHARDWAJ, MANOJ; ...

2013-04-01

Finite element analysis of transient acoustic phenomena on unbounded exterior domains is very common in engineering analysis. In these problems there is a common need to compute the acoustic pressure at points outside of the acoustic mesh, since meshing to points of interest is impractical in many scenarios. In aeroacoustic calculations, for example, the acoustic pressure may be required at tens or hundreds of meters from the structure. In these cases, a method is needed for post-processing the acoustic results to compute the response at far-field points. In this paper, we compare two methods for computing far-field acoustic pressures, onemore » derived directly from the infinite element solution, and the other from the transient version of the Kirchhoff integral. Here, we show that the infinite element approach alleviates the large storage requirements that are typical of Kirchhoff integral and related procedures, and also does not suffer from loss of accuracy that is an inherent part of computing numerical derivatives in the Kirchhoff integral. In order to further speed up and streamline the process of computing the acoustic response at points outside of the mesh, we also address the nonlinear iterative procedure needed for locating parametric coordinates within the host infinite element of far-field points, the parallelization of the overall process, linear solver requirements, and system stability considerations.« less
A New Quality Control Method base on IRMCD for Wind Profiler Observation towards Future Assimilation Application

NASA Astrophysics Data System (ADS)

Chen, Min; Zhang, Yu

2017-04-01

A wind profiler network with a total of 65 profiling radars was operated by the MOC/CMA in China until July 2015. In this study, a quality control procedure is constructed to incorporate the profiler data from the wind-profiling network into the local data assimilation and forecasting system (BJRUC). The procedure applies a blacklisting check that removes stations with gross errors and an outlier check that rejects data with large deviations from the background. Instead of the bi-weighting method, which has been commonly implemented in outlier elimination for one-dimensional scalar observations, an outlier elimination method is developed based on the iterated reweighted minimum covariance determinant (IRMCD) for multi-variate observations such as wind profiler data. A quality control experiment is separately performed for subsets containing profiler data tagged in parallel with/without rain flags at every 00UTC/12UTC from 20 June to 30 Sep 2015. From the results, we find that with the quality control, the frequency distributions of the differences between the observations and model background become more Gaussian-like and meet the requirements of a Gaussian distribution for data assimilation. Further intensive assessment for each quality control step reveals that the stations rejected by blacklisting contain poor data quality, and the IRMCD rejects outliers in a robust and physically reasonable manner.
Optimizing transformations of stencil operations for parallel cache-based architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bassetti, F.; Davis, K.

This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.

PubMed

Slażyński, Leszek; Bohte, Sander

2012-01-01

The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Towards the Experimental Assessment of the DQE in SPECT Scanners

NASA Astrophysics Data System (ADS)

Fountos, G. P.; Michail, C. M.

2017-11-01

The purpose of this work was to introduce the Detective Quantum Efficiency (DQE) in single photon emission computed tomography (SPECT) systems using a flood source. A Tc-99m-based flood source (Eγ = 140 keV) consisting of a radiopharmaceutical solution of dithiothreitol (DTT, 10-3 M)/Tc-99m(III)-DMSA, 40 mCi/40 ml bound to the grains of an Agfa MammoRay HDR Medical X-ray film) was prepared in laboratory. The source was placed between two PMMA blocks and images were obtained by using the brain tomographic acquisition protocol (DatScan-brain). The Modulation Transfer Function (MTF) was evaluated using the Iterative 2D algorithm. All imaging experiments were performed in a Siemens e-Cam gamma camera. The Normalized Noise Power spectra (NNPS) were obtained from the sagittal views of the source. The higher MTF values were obtained for the Flash Iterative 2D with 24 iterations and 20 subsets. The noise levels of the SPECT reconstructed images, in terms of the NNPS, were found to increase as the number of iterations increase. The behavior of the DQE was influenced by both MTF and NNPS. As the number of iterations was increased, higher MTF values were obtained, however with a parallel, increase of magnitude in image noise, as depicted from the NNPS results. DQE values, which were influenced by both MTF and NNPS, were found higher when the number of iterations results in resolution saturation. The method presented here is novel and easy to implement, requiring materials commonly found in clinical practice and can be useful in the quality control of SPECT scanners.
The effect of pre-existing islands on disruption mitigation in MHD simulations of DIII-D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Izzo, V. A.

Locked-modes are the most likely cause of disruptions in ITER, so large islands are expected to be common when the ITER disruption mitigation system is deployed. MHD modeling of disruption mitigation by massive gas injection is carried out for DIII-D plasmas with stationary, pre-existing islands. Results show that the magnetic topology at the q=2 surface can affect the parallel spreading of injected impurities, and that, in particular, the break-up of large 2/1 islands into smaller 4/2 islands chains can favorably affect mitigation metrics. The direct imposition of a 4/2 mode is found to have similar results to the case inmore » which the 4/2 harmonic grows spontaneously.« less
The effect of pre-existing islands on disruption mitigation in MHD simulations of DIII-D

DOE PAGES

Izzo, V. A.

2017-02-27

Locked-modes are the most likely cause of disruptions in ITER, so large islands are expected to be common when the ITER disruption mitigation system is deployed. MHD modeling of disruption mitigation by massive gas injection is carried out for DIII-D plasmas with stationary, pre-existing islands. Results show that the magnetic topology at the q=2 surface can affect the parallel spreading of injected impurities, and that, in particular, the break-up of large 2/1 islands into smaller 4/2 islands chains can favorably affect mitigation metrics. The direct imposition of a 4/2 mode is found to have similar results to the case inmore » which the 4/2 harmonic grows spontaneously.« less
Projection matrix acquisition for cone-beam computed tomography iterative reconstruction

NASA Astrophysics Data System (ADS)

Yang, Fuqiang; Zhang, Dinghua; Huang, Kuidong; Shi, Wenlong; Zhang, Caixin; Gao, Zongzhao

2017-02-01

Projection matrix is an essential and time-consuming part in computed tomography (CT) iterative reconstruction. In this article a novel calculation algorithm of three-dimensional (3D) projection matrix is proposed to quickly acquire the matrix for cone-beam CT (CBCT). The CT data needed to be reconstructed is considered as consisting of the three orthogonal sets of equally spaced and parallel planes, rather than the individual voxels. After getting the intersections the rays with the surfaces of the voxels, the coordinate points and vertex is compared to obtain the index value that the ray traversed. Without considering ray-slope to voxel, it just need comparing the position of two points. Finally, the computer simulation is used to verify the effectiveness of the algorithm.
A conjugate gradient method for solving the non-LTE line radiation transfer problem

NASA Astrophysics Data System (ADS)

Paletou, F.; Anterrieu, E.

2009-12-01

This study concerns the fast and accurate solution of the line radiation transfer problem, under non-LTE conditions. We propose and evaluate an alternative iterative scheme to the classical ALI-Jacobi method, and to the more recently proposed Gauss-Seidel and successive over-relaxation (GS/SOR) schemes. Our study is indeed based on applying a preconditioned bi-conjugate gradient method (BiCG-P). Standard tests, in 1D plane parallel geometry and in the frame of the two-level atom model with monochromatic scattering are discussed. Rates of convergence between the previously mentioned iterative schemes are compared, as are their respective timing properties. The smoothing capability of the BiCG-P method is also demonstrated.
Iterative combining rules for the van der Waals potentials of mixed rare gas systems

NASA Astrophysics Data System (ADS)

Wei, L. M.; Li, P.; Tang, K. T.

2017-05-01

An iterative procedure is introduced to make the results of some simple combining rules compatible with the Tang-Toennies potential model. The method is used to calculate the well locations Re and the well depths De of the van der Waals potentials of the mixed rare gas systems from the corresponding values of the homo-nuclear dimers. When the ;sizes; of the two interacting atoms are very different, several rounds of iteration are required for the results to converge. The converged results can be substantially different from the starting values obtained from the combining rules. However, if the sizes of the interacting atoms are close, only one or even no iteration is necessary for the results to converge. In either case, the converged results are the accurate descriptions of the interaction potentials of the hetero-nuclear dimers.
Accelerating the weighted histogram analysis method by direct inversion in the iterative subspace.

PubMed

Zhang, Cheng; Lai, Chun-Liang; Pettitt, B Montgomery

The weighted histogram analysis method (WHAM) for free energy calculations is a valuable tool to produce free energy differences with the minimal errors. Given multiple simulations, WHAM obtains from the distribution overlaps the optimal statistical estimator of the density of states, from which the free energy differences can be computed. The WHAM equations are often solved by an iterative procedure. In this work, we use a well-known linear algebra algorithm which allows for more rapid convergence to the solution. We find that the computational complexity of the iterative solution to WHAM and the closely-related multiple Bennett acceptance ratio (MBAR) method can be improved by using the method of direct inversion in the iterative subspace. We give examples from a lattice model, a simple liquid and an aqueous protein solution.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE PAGES

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

2017-11-14

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

NASA Astrophysics Data System (ADS)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

2017-11-01

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
A Centered Projective Algorithm for Linear Programming

DTIC Science & Technology

1988-02-01

zx/l to (PA Karmarkar’s algorithm iterates this procedure. An alternative method, the so-called affine variant (first proposed by Dikin [6] in 1967...trajectories, II. Legendre transform coordinates . central trajectories," manuscripts, to appear in Transactions of the American [6] I.I. Dikin ...34Iterative solution of problems of linear and quadratic programming," Soviet Mathematics Dokladv 8 (1967), 674-675. [7] I.I. Dikin , "On the speed of an
MPL-A program for computations with iterated integrals on moduli spaces of curves of genus zero

NASA Astrophysics Data System (ADS)

Bogner, Christian

2016-06-01

We introduce the Maple program MPL for computations with multiple polylogarithms. The program is based on homotopy invariant iterated integrals on moduli spaces M0,n of curves of genus 0 with n ordered marked points. It includes the symbol map and procedures for the analytic computation of period integrals on M0,n. It supports the automated computation of a certain class of Feynman integrals.
Increasing airport capacity with modified IFR approach procedures for close-spaced parallel runways

DOT National Transportation Integrated Search

2001-01-01

Because of wake turbulence considerations, current instrument approach : procedures treat close-spaced (i.e., less than 2,500 feet apart) parallel run : ways as a single runway. This restriction is designed to assure safety for all : aircraft types u...

S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Noniterative accurate algorithm for the exact exchange potential of density-functional theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cinal, M.; Holas, A.

2007-10-15

An algorithm for determination of the exchange potential is constructed and tested. It represents a one-step procedure based on the equations derived by Krieger, Li, and Iafrate (KLI) [Phys. Rev. A 46, 5453 (1992)], implemented already as an iterative procedure by Kuemmel and Perdew [Phys. Rev. Lett. 90, 043004 (2003)]. Due to suitable transformation of the KLI equations, we can solve them avoiding iterations. Our algorithm is applied to the closed-shell atoms, from Be up to Kr, within the DFT exchange-only approximation. Using pseudospectral techniques for representing orbitals, we obtain extremely accurate values of total and orbital energies with errorsmore » at least four orders of magnitude smaller than known in the literature.« less
A Block Iterative Finite Element Model for Nonlinear Leaky Aquifer Systems

NASA Astrophysics Data System (ADS)

Gambolati, Giuseppe; Teatini, Pietro

1996-01-01

A new quasi three-dimensional finite element model of groundwater flow is developed for highly compressible multiaquifer systems where aquitard permeability and elastic storage are dependent on hydraulic drawdown. The model is solved by a block iterative strategy, which is naturally suggested by the geological structure of the porous medium and can be shown to be mathematically equivalent to a block Gauss-Seidel procedure. As such it can be generalized into a block overrelaxation procedure and greatly accelerated by the use of the optimum overrelaxation factor. Results for both linear and nonlinear multiaquifer systems emphasize the excellent computational performance of the model and indicate that convergence in leaky systems can be improved up to as much as one order of magnitude.
Preconditioned conjugate residual methods for the solution of spectral equations

NASA Technical Reports Server (NTRS)

Wong, Y. S.; Zang, T. A.; Hussaini, M. Y.

1986-01-01

Conjugate residual methods for the solution of spectral equations are described. An inexact finite-difference operator is introduced as a preconditioner in the iterative procedures. Application of these techniques is limited to problems for which the symmetric part of the coefficient matrix is positive definite. Although the spectral equation is a very ill-conditioned and full matrix problem, the computational effort of the present iterative methods for solving such a system is comparable to that for the sparse matrix equations obtained from the application of either finite-difference or finite-element methods to the same problems. Numerical experiments are shown for a self-adjoint elliptic partial differential equation with Dirichlet boundary conditions, and comparison with other solution procedures for spectral equations is presented.
A new solution procedure for a nonlinear infinite beam equation of motion

NASA Astrophysics Data System (ADS)

Jang, T. S.

2016-10-01

Our goal of this paper is of a purely theoretical question, however which would be fundamental in computational partial differential equations: Can a linear solution-structure for the equation of motion for an infinite nonlinear beam be directly manipulated for constructing its nonlinear solution? Here, the equation of motion is modeled as mathematically a fourth-order nonlinear partial differential equation. To answer the question, a pseudo-parameter is firstly introduced to modify the equation of motion. And then, an integral formalism for the modified equation is found here, being taken as a linear solution-structure. It enables us to formulate a nonlinear integral equation of second kind, equivalent to the original equation of motion. The fixed point approach, applied to the integral equation, results in proposing a new iterative solution procedure for constructing the nonlinear solution of the original beam equation of motion, which consists luckily of just the simple regular numerical integration for its iterative process; i.e., it appears to be fairly simple as well as straightforward to apply. A mathematical analysis is carried out on both natures of convergence and uniqueness of the iterative procedure by proving a contractive character of a nonlinear operator. It follows conclusively,therefore, that it would be one of the useful nonlinear strategies for integrating the equation of motion for a nonlinear infinite beam, whereby the preceding question may be answered. In addition, it may be worth noticing that the pseudo-parameter introduced here has double roles; firstly, it connects the original beam equation of motion with the integral equation, second, it is related with the convergence of the iterative method proposed here.
Constructing Integrable Full-pressure Full-current Free-boundary Stellarator Magnetohydrodynamic Equilibria

NASA Astrophysics Data System (ADS)

Hudson, S. R.; Monticello, D. A.; Reiman, A. H.; Strickler, D. J.; Hirshman, S. P.

2003-06-01

For the (non-axisymmetric) stellarator class of plasma confinement devices to be feasible candidates for fusion power stations it is essential that, to a good approximation, the magnetic field lines lie on nested flux surfaces; however, the inherent lack of a continuous symmetry implies that magnetic islands are guaranteed to exist. Magnetic islands break the smooth topology of nested flux surfaces and chaotic field lines result when magnetic islands overlap. An analogous case occurs with 11/2-dimension Hamiltonian systems where resonant perturbations cause singularities in the transformation to action-angle coordinates and destroy integrability. The suppression of magnetic islands is a critical issue for stellarator design, particularly for small aspect ratio devices. Techniques for `healing' vacuum fields and fixed-boundary plasma equilibria have been developed, but what is ultimately required is a procedure for designing stellarators such that the self-consistent plasma equilibrium currents and the coil currents combine to produce an integrable magnetic field, and such a procedure is presented here for the first time. Magnetic islands in free-boundary full-pressure full-current stellarator magnetohydrodynamic equilibria are suppressed using a procedure based on the Princeton Iterative Equilibrium Solver [A.H.Reiman & H.S.Greenside, Comp. Phys. Comm., 43:157, 1986.] which iterates the equilibrium equations to obtain the plasma equilibrium. At each iteration, changes to a Fourier representation of the coil geometry are made to cancel resonant fields produced by the plasma. As the iterations continue, the coil geometry and the plasma simultaneously converge to an equilibrium in which the island content is negligible. The method is applied to a candidate plasma and coil design for the National Compact Stellarator eXperiment [G.H.Neilson et.al., Phys. Plas., 7:1911, 2000.].
Efficient and robust relaxation procedures for multi-component mixtures including phase transition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Han, Ee, E-mail: eehan@math.uni-bremen.de; Hantke, Maren, E-mail: maren.hantke@ovgu.de; Müller, Siegfried, E-mail: mueller@igpm.rwth-aachen.de

We consider a thermodynamic consistent multi-component model in multi-dimensions that is a generalization of the classical two-phase flow model of Baer and Nunziato. The exchange of mass, momentum and energy between the phases is described by additional source terms. Typically these terms are handled by relaxation procedures. Available relaxation procedures suffer from efficiency and robustness resulting in very costly computations that in general only allow for one-dimensional computations. Therefore we focus on the development of new efficient and robust numerical methods for relaxation processes. We derive exact procedures to determine mechanical and thermal equilibrium states. Further we introduce a novelmore » iterative method to treat the mass transfer for a three component mixture. All new procedures can be extended to an arbitrary number of inert ideal gases. We prove existence, uniqueness and physical admissibility of the resulting states and convergence of our new procedures. Efficiency and robustness of the procedures are verified by means of numerical computations in one and two space dimensions. - Highlights: • We develop novel relaxation procedures for a generalized, thermodynamically consistent Baer–Nunziato type model. • Exact procedures for mechanical and thermal relaxation procedures avoid artificial parameters. • Existence, uniqueness and physical admissibility of the equilibrium states are proven for special mixtures. • A novel iterative method for mass transfer is introduced for a three component mixture providing a unique and admissible equilibrium state.« less
Development of a pressure based multigrid solution method for complex fluid flows

NASA Technical Reports Server (NTRS)

Shyy, Wei

1991-01-01

In order to reduce the computational difficulty associated with a single grid (SG) solution procedure, the multigrid (MG) technique was identified as a useful means for improving the convergence rate of iterative methods. A full MG full approximation storage (FMG/FAS) algorithm is used to solve the incompressible recirculating flow problems in complex geometries. The algorithm is implemented in conjunction with a pressure correction staggered grid type of technique using the curvilinear coordinates. In order to show the performance of the method, two flow configurations, one a square cavity and the other a channel, are used as test problems. Comparisons are made between the iterations, equivalent work units, and CPU time. Besides showing that the MG method can yield substantial speed-up with wide variations in Reynolds number, grid distributions, and geometry, issues such as the convergence characteristics of different grid levels, the choice of convection schemes, and the effectiveness of the basic iteration smoothers are studied. An adaptive grid scheme is also combined with the MG procedure to explore the effects of grid resolution on the MG convergence rate as well as the numerical accuracy.
Fast solution of elliptic partial differential equations using linear combinations of plane waves.

PubMed

Pérez-Jordá, José M

2016-02-01

Given an arbitrary elliptic partial differential equation (PDE), a procedure for obtaining its solution is proposed based on the method of Ritz: the solution is written as a linear combination of plane waves and the coefficients are obtained by variational minimization. The PDE to be solved is cast as a system of linear equations Ax=b, where the matrix A is not sparse, which prevents the straightforward application of standard iterative methods in order to solve it. This sparseness problem can be circumvented by means of a recursive bisection approach based on the fast Fourier transform, which makes it possible to implement fast versions of some stationary iterative methods (such as Gauss-Seidel) consuming O(NlogN) memory and executing an iteration in O(Nlog(2)N) time, N being the number of plane waves used. In a similar way, fast versions of Krylov subspace methods and multigrid methods can also be implemented. These procedures are tested on Poisson's equation expressed in adaptive coordinates. It is found that the best results are obtained with the GMRES method using a multigrid preconditioner with Gauss-Seidel relaxation steps.
Experiments on Learning by Back Propagation.

ERIC Educational Resources Information Center

Plaut, David C.; And Others

This paper describes further research on a learning procedure for layered networks of deterministic, neuron-like units, described by Rumelhart et al. The units, the way they are connected, the learning procedure, and the extension to iterative networks are presented. In one experiment, a network learns a set of filters, enabling it to discriminate…
How good are the Garvey-Kelson predictions of nuclear masses?

NASA Astrophysics Data System (ADS)

Morales, Irving O.; López Vieyra, J. C.; Hirsch, J. G.; Frank, A.

2009-09-01

The Garvey-Kelson relations are used in an iterative process to predict nuclear masses in the neighborhood of nuclei with measured masses. Average errors in the predicted masses for the first three iteration shells are smaller than those obtained with the best nuclear mass models. Their quality is comparable with the Audi-Wapstra extrapolations, offering a simple and reproducible procedure for short range mass predictions. A systematic study of the way the error grows as a function of the iteration and the distance to the known masses region, shows that a correlation exists between the error and the residual neutron-proton interaction, produced mainly by the implicit assumption that V varies smoothly along the nuclear landscape.
Parallel Treatments Design: A Nested Single Subject Design for Comparing Instructional Procedures.

ERIC Educational Resources Information Center

Gast, David L.; Wolery, Mark

1988-01-01

This paper describes the parallel treatments design, a nested single subject experimental design that combines two concurrently implemented multiple probe designs, allows control for effects of extraneous variables through counterbalancing, and replicates its effects across behaviors. Procedural guidelines for the design's use and issues related…
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi; Hixon, Duane

1993-01-01

The work done under this project was documented in detail as the Ph. D. dissertation of Dr. Duane Hixon. The objectives of the research project were evaluation of the generalized minimum residual method (GMRES) as a tool for accelerating 2-D and 3-D unsteady flows and evaluation of the suitability of the GMRES algorithm for unsteady flows, computed on parallel computer architectures.
Methods for design and evaluation of integrated hardware-software systems for concurrent computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.
Iterative Procedures for Exact Maximum Likelihood Estimation in the First-Order Gaussian Moving Average Model

DTIC Science & Technology

1990-11-01

1 = Q- 1 - 1 QlaaQ- 1.1 + a’Q-1a This is a simple case of a general formula called Woodbury’s formula by some authors; see, for example, Phadke and...1 2. The First-Order Moving Average Model ..... .................. 3. Some Approaches to the Iterative...the approximate likelihood function in some time series models. Useful suggestions have been the Cholesky decomposition of the covariance matrix and
Burning plasma regime for Fussion-Fission Research Facility

NASA Astrophysics Data System (ADS)

Zakharov, Leonid E.

2010-11-01

The basic aspects of burning plasma regimes of Fusion-Fission Research Facility (FFRF, R/a=4/1 m/m, Ipl=5 MA, Btor=4-6 T, P^DT=50-100 MW, P^fission=80-4000 MW, 1 m thick blanket), which is suggested as the next step device for Chinese fusion program, are presented. The mission of FFRF is to advance magnetic fusion to the level of a stationary neutron source and to create a technical, scientific, and technology basis for the utilization of high-energy fusion neutrons for the needs of nuclear energy and technology. FFRF will rely as much as possible on ITER design. Thus, the magnetic system, especially TFC, will take advantage of ITER experience. TFC will use the same superconductor as ITER. The plasma regimes will represent an extension of the stationary plasma regimes on HT-7 and EAST tokamaks at ASIPP. Both inductive discharges and stationary non-inductive Lower Hybrid Current Drive (LHCD) will be possible. FFRF strongly relies on new, Lithium Wall Fusion (LiWF) plasma regimes, the development of which will be done on NSTX, HT-7, EAST in parallel with the design work. This regime will eliminate a number of uncertainties, still remaining unresolved in the ITER project. Well controlled, hours long inductive current drive operation at P^DT=50-100 MW is predicted.
Functional materials for breeding blankets—status and developments

NASA Astrophysics Data System (ADS)

Konishi, S.; Enoeda, M.; Nakamichi, M.; Hoshino, T.; Ying, A.; Sharafat, S.; Smolentsev, S.

2017-09-01

The development of tritium breeder, neutron multiplier and flow channel insert materials for the breeding blanket of the DEMO reactor is reviewed. Present emphasis is on the ITER test blanket module (TBM); lithium metatitanate (Li2TiO3) and lithium orthosilicate (Li4SiO4) pebbles have been developed by leading TBM parties. Beryllium pebbles have been selected as the neutron multiplier. Good progress has been made in their fabrication; however, verification of the design by experiments is in the planning stage. Irradiation data are also limited, but the decrease in thermal conductivity of beryllium due to irradiation followed by swelling is a concern. Tests at ITER are regarded as a major milestone. For the DEMO reactor, improvement of the breeder has been attempted to obtain a higher lithium content, and Be12Ti and other beryllide intermetallic compounds that have superior chemical stability have been studied. LiPb eutectic has been considered as a DEMO blanket in the liquid breeder option and is used as a coolant to achieve a higher outlet temperature; a SiC flow channel insert is used to prevent magnetohydrodynamic pressure drop and corrosion. A significant technical gap between ITER TBM and DEMO is recognized, and the world fusion community is working on ITER TBM and DEMO blanket development in parallel.
Iterative methods for plasma sheath calculations: Application to spherical probe

NASA Technical Reports Server (NTRS)

Parker, L. W.; Sullivan, E. C.

1973-01-01

The computer cost of a Poisson-Vlasov iteration procedure for the numerical solution of a steady-state collisionless plasma-sheath problem depends on: (1) the nature of the chosen iterative algorithm, (2) the position of the outer boundary of the grid, and (3) the nature of the boundary condition applied to simulate a condition at infinity (as in three-dimensional probe or satellite-wake problems). Two iterative algorithms, in conjunction with three types of boundary conditions, are analyzed theoretically and applied to the computation of current-voltage characteristics of a spherical electrostatic probe. The first algorithm was commonly used by physicists, and its computer costs depend primarily on the boundary conditions and are only slightly affected by the mesh interval. The second algorithm is not commonly used, and its costs depend primarily on the mesh interval and slightly on the boundary conditions.
External heating and current drive source requirements towards steady-state operation in ITER

NASA Astrophysics Data System (ADS)

Poli, F. M.; Kessel, C. E.; Bonoli, P. T.; Batchelor, D. B.; Harvey, R. W.; Snyder, P. B.

2014-07-01

Steady state scenarios envisaged for ITER aim at optimizing the bootstrap current, while maintaining sufficient confinement and stability to provide the necessary fusion yield. Non-inductive scenarios will need to operate with internal transport barriers (ITBs) in order to reach adequate fusion gain at typical currents of 9 MA. However, the large pressure gradients associated with ITBs in regions of weak or negative magnetic shear can be conducive to ideal MHD instabilities, reducing the no-wall limit. The E × B flow shear from toroidal plasma rotation is expected to be low in ITER, with a major role in the ITB dynamics being played by magnetic geometry. Combinations of heating and current drive (H/CD) sources that sustain reversed magnetic shear profiles throughout the discharge are the focus of this work. Time-dependent transport simulations indicate that a combination of electron cyclotron (EC) and lower hybrid (LH) waves is a promising route towards steady state operation in ITER. The LH forms and sustains expanded barriers and the EC deposition at mid-radius freezes the bootstrap current profile stabilizing the barrier and leading to confinement levels 50% higher than typical H-mode energy confinement times. Using LH spectra with spectrum centred on parallel refractive index of 1.75-1.85, the performance of these plasma scenarios is close to the ITER target of 9 MA non-inductive current, global confinement gain H98 = 1.6 and fusion gain Q = 5.
Iterative refinement of structure-based sequence alignments by Seed Extension

PubMed Central

Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook

2009-01-01

Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133

Transformation of two and three-dimensional regions by elliptic systems

NASA Technical Reports Server (NTRS)

Mastin, C. Wayne

1991-01-01

A reliable linear system is presented for grid generation in 2-D and 3-D. The method is robust in the sense that convergence is guaranteed but is not as reliable as other nonlinear elliptic methods in generating nonfolding grids. The construction of nonfolding grids depends on having reasonable approximations of cell aspect ratios and an appropriate distribution of grid points on the boundary of the region. Some guidelines are included on approximating the aspect ratios, but little help is offered on setting up the boundary grid other than to say that in 2-D the boundary correspondence should be close to that generated by a conformal mapping. It is assumed that the functions which control the grid distribution depend only on the computational variables and not on the physical variables. Whether this is actually the case depends on how the grid is constructed. In a dynamic adaptive procedure where the grid is constructed in the process of solving a fluid flow problem, the grid is usually updated at fixed iteration counts using the current value of the control function. Since the control function is not being updated during the iteration of the grid equations, the grid construction is a linear procedure. However, in the case of a static adaptive procedure where a trial solution is computed and used to construct an adaptive grid, the control functions may be recomputed at every step of the grid iteration.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

2013-07-01

In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Krylov subspace methods on supercomputers

NASA Technical Reports Server (NTRS)

Saad, Youcef

1988-01-01

A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.
Sequential and parallel image restoration: neural network implementations.

PubMed

Figueiredo, M T; Leitao, J N

1994-01-01

Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
Reducing neural network training time with parallel processing

NASA Technical Reports Server (NTRS)

Rogers, James L., Jr.; Lamarsh, William J., II

1995-01-01

Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.
Considering Horn's Parallel Analysis from a Random Matrix Theory Point of View.

PubMed

Saccenti, Edoardo; Timmerman, Marieke E

2017-03-01

Horn's parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy-Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy-Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy-Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy-Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.
Configuration affects parallel stent grafting results.

PubMed

Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

2018-05-01

A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31.25% of retrograde stents having any complication. Parallel stent grafting offers an off-the-shelf option to treat a variety of aortic diseases. There is an increased risk of parallel stent and overall EVAR compromise with <10% main body oversizing. Thirty-day mortality is increased when more than one parallel stent is placed. Antegrade configurations are preferred to any retrograde configuration, with optimal oversizing >20%. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Use of Multi-class Empirical Orthogonal Function for Identification of Hydrogeological Parameters and Spatiotemporal Pattern of Multiple Recharges in Groundwater Modeling

NASA Astrophysics Data System (ADS)

Huang, C. L.; Hsu, N. S.; Yeh, W. W. G.; Hsieh, I. H.

2017-12-01

This study develops an innovative calibration method for regional groundwater modeling by using multi-class empirical orthogonal functions (EOFs). The developed method is an iterative approach. Prior to carrying out the iterative procedures, the groundwater storage hydrographs associated with the observation wells are calculated. The combined multi-class EOF amplitudes and EOF expansion coefficients of the storage hydrographs are then used to compute the initial gauss of the temporal and spatial pattern of multiple recharges. The initial guess of the hydrogeological parameters are also assigned according to in-situ pumping experiment. The recharges include net rainfall recharge and boundary recharge, and the hydrogeological parameters are riverbed leakage conductivity, horizontal hydraulic conductivity, vertical hydraulic conductivity, storage coefficient, and specific yield. The first step of the iterative algorithm is to conduct the numerical model (i.e. MODFLOW) by the initial guess / adjusted values of the recharges and parameters. Second, in order to determine the best EOF combination of the error storage hydrographs for determining the correction vectors, the objective function is devised as minimizing the root mean square error (RMSE) of the simulated storage hydrographs. The error storage hydrograph are the differences between the storage hydrographs computed from observed and simulated groundwater level fluctuations. Third, adjust the values of recharges and parameters and repeat the iterative procedures until the stopping criterion is reached. The established methodology was applied to the groundwater system of Ming-Chu Basin, Taiwan. The study period is from January 1st to December 2ed in 2012. Results showed that the optimal EOF combination for the multiple recharges and hydrogeological parameters can decrease the RMSE of the simulated storage hydrographs dramatically within three calibration iterations. It represents that the iterative approach that using EOF techniques can capture the groundwater flow tendency and detects the correction vector of the simulated error sources. Hence, the established EOF-based methodology can effectively and accurately identify the multiple recharges and hydrogeological parameters.
String-averaging incremental subgradients for constrained convex optimization with applications to reconstruction of tomographic images

NASA Astrophysics Data System (ADS)

Massambone de Oliveira, Rafael; Salomão Helou, Elias; Fontoura Costa, Eduardo

2016-11-01

We present a method for non-smooth convex minimization which is based on subgradient directions and string-averaging techniques. In this approach, the set of available data is split into sequences (strings) and a given iterate is processed independently along each string, possibly in parallel, by an incremental subgradient method (ISM). The end-points of all strings are averaged to form the next iterate. The method is useful to solve sparse and large-scale non-smooth convex optimization problems, such as those arising in tomographic imaging. A convergence analysis is provided under realistic, standard conditions. Numerical tests are performed in a tomographic image reconstruction application, showing good performance for the convergence speed when measured as the decrease ratio of the objective function, in comparison to classical ISM.
Non-LTE models for synthetic spectra of type Ia supernovae. III. An accelerated lambda-iteration procedure for the mutual interaction of strong spectral lines in SN Ia models with and without energy deposition

NASA Astrophysics Data System (ADS)

Pauldrach, A. W. A.; Hoffmann, T. L.; Hultzsch, P. J. N.

2014-09-01

Context. In type Ia supernova (SN Ia) envelopes a huge number of lines of different elements overlap within their thermal Doppler widths, and this problem is exacerbated by the circumstance that up to 20% of these lines can have a line optical depth higher than 1. The stagnation of the lambda iteration in such optically thick cases is one of the fundamental physical problems inherent in the iterative solution of the non-LTE problem, and the failure of a lambda iteration to converge is a point of crucial importance whose physical significance must be understood completely. Aims: We discuss a general problem related to radiative transfer under the physical conditions of supernova ejecta that involves a failure of the usual non-LTE iteration scheme to converge when multiple strong opacities belonging to different physical transitions come together, similar to the well-known situation where convergence is impaired even when only a single process attains high optical depths. The convergence problem is independent of the chosen frequency and depth grid spacing, independent of whether the radiative transfer is solved in the comoving or observer's frame, and independent of whether a common complete-linearization scheme or a conventional accelerated lambda iteration (ALI) is used. The problem appears when all millions of line transitions required for a realistic description of SN Ia envelopes are treated in the frame of a comprehensive non-LTE model. The only solution to this problem is a complete-linearization approach that considers all ions of all elements simultaneously, or an adequate generalization of the established ALI technique that accounts for the mutual interaction of the strong spectral lines of different elements and which thereby unfreezes the "stuck" state of the iteration. Methods: The physics of the atmospheres of SN Ia are strongly affected by the high-velocity expansion of the ejecta, which dominates the formation of the spectra at all wavelength ranges. Thus, hydrodynamic explosion models and realistic model atmospheres that take into account the strong deviation from local thermodynamic equilibrium (LTE) are necessary for the synthesis and analysis of the spectra. In this regard one of the biggest challenges we have found in modeling the radiative transfer in SN Ia is the fact that the radiative energy in the UV has to be transferred only via spectral lines into the optical regime to be able to leave the ejecta. However, convergence of the model toward a state where this is possible is impaired when using the standard procedures. We report on improvements in our approach of computing synthetic spectra for SN Ia with respect to (i) an improved and sophisticated treatment of many thousands of strong lines that interact intricately with the "pseudo-continuum" formed entirely by Doppler-shifted spectral lines; (ii) an improved and expanded atomic database; and (iii) the inclusion of energy deposition within the ejecta arising from the radioactive decay of mostly 56Ni and 56Co. Results: We show that an ALI procedure we have developed for the mutual interaction of strong spectral lines appearing in the atmospheres of SNe Ia solves the long-standing problem of transferring the radiative energy from the UV into the optical regime. Our new method thus constitutes a foundation for more refined models, such as those including energy deposition. In this regard we furthermore show synthetic spectra obtained with various methods adopted for the released energy and compare them with observations. We discuss in detail applications of the diagnostic technique by example of a standard type Ia supernova, where the comparison of calculated and observed spectra revealed that in the early phases the consideration of the energy deposition within the spectrum-forming regions of the ejecta does not qualitatively alter the shape of the emergent spectra. Conclusions: The results of our investigation lead to an improved understanding of how the shape of the spectrum changes radically as function of depth in the ejecta, and show how different emergent spectra are formed as a result of the particular physical properties of SNe Ia ejecta and the resulting peculiarities in the radiative transfer. This knowledge provides an important insight into the process of extracting information from observed SN Ia spectra, since these spectra are a complex product of numerous unobservable SN Ia spectral features, which are thus analyzed in parallel to the observable SN Ia spectral features.
Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory

NASA Astrophysics Data System (ADS)

Jia, Weile; Lin, Lin

2017-10-01

Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.
3D unstructured-mesh radiation transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morel, J.

1997-12-31

Three unstructured-mesh radiation transport codes are currently being developed at Los Alamos National Laboratory. The first code is ATTILA, which uses an unstructured tetrahedral mesh in conjunction with standard Sn (discrete-ordinates) angular discretization, standard multigroup energy discretization, and linear-discontinuous spatial differencing. ATTILA solves the standard first-order form of the transport equation using source iteration in conjunction with diffusion-synthetic acceleration of the within-group source iterations. DANTE is designed to run primarily on workstations. The second code is DANTE, which uses a hybrid finite-element mesh consisting of arbitrary combinations of hexahedra, wedges, pyramids, and tetrahedra. DANTE solves several second-order self-adjoint forms of the transport equation including the even-parity equation, the odd-parity equation, and a new equation called the self-adjoint angular flux equation. DANTE also offers three angular discretization options:more » $$S{_}n$$ (discrete-ordinates), $$P{_}n$$ (spherical harmonics), and $$SP{_}n$$ (simplified spherical harmonics). DANTE is designed to run primarily on massively parallel message-passing machines, such as the ASCI-Blue machines at LANL and LLNL. The third code is PERICLES, which uses the same hybrid finite-element mesh as DANTE, but solves the standard first-order form of the transport equation rather than a second-order self-adjoint form. DANTE uses a standard $$S{_}n$$ discretization in angle in conjunction with trilinear-discontinuous spatial differencing, and diffusion-synthetic acceleration of the within-group source iterations. PERICLES was initially designed to run on workstations, but a version for massively parallel message-passing machines will be built. The three codes will be described in detail and computational results will be presented.« less
Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory.

PubMed

Jia, Weile; Lin, Lin

2017-10-14

Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.
Parabolized Navier-Stokes solutions of separation and trailing-edge flows

NASA Technical Reports Server (NTRS)

Brown, J. L.

1983-01-01

A robust, iterative solution procedure is presented for the parabolized Navier-Stokes or higher order boundary layer equations as applied to subsonic viscous-inviscid interaction flows. The robustness of the present procedure is due, in part, to an improved algorithmic formulation. The present formulation is based on a reinterpretation of stability requirements for this class of algorithms and requires only second order accurate backward or central differences for all streamwise derivatives. Upstream influence is provided for through the algorithmic formulation and iterative sweeps in x. The primary contribution to robustness, however, is the boundary condition treatment, which imposes global constraints to control the convergence path. Discussed are successful calculations of subsonic, strong viscous-inviscid interactions, including separation. These results are consistent with Navier-Stokes solutions and triple deck theory.
A fresh look at electron cyclotron current drive power requirements for stabilization of tearing modes in ITER

DOE Office of Scientific and Technical Information (OSTI.GOV)

La Haye, R. J., E-mail: lahaye@fusion.gat.com

2015-12-10

ITER is an international project to design and build an experimental fusion reactor based on the “tokamak” concept. ITER relies upon localized electron cyclotron current drive (ECCD) at the rational safety factor q=2 to suppress or stabilize the expected poloidal mode m=2, toroidal mode n=1 neoclassical tearing mode (NTM) islands. Such islands if unmitigated degrade energy confinement, lock to the resistive wall (stop rotating), cause loss of “H-mode” and induce disruption. The International Tokamak Physics Activity (ITPA) on MHD, Disruptions and Magnetic Control joint experiment group MDC-8 on Current Drive Prevention/Stabilization of Neoclassical Tearing Modes started in 2005, after whichmore » assessments were made for the requirements for ECCD needed in ITER, particularly that of rf power and alignment on q=2 [1]. Narrow well-aligned rf current parallel to and of order of one percent of the total plasma current is needed to replace the “missing” current in the island O-points and heal or preempt (avoid destabilization by applying ECCD on q=2 in absence of the mode) the island [2-4]. This paper updates the advances in ECCD stabilization on NTMs learned in DIII-D experiments and modeling during the last 5 to 10 years as applies to stabilization by localized ECCD of tearing modes in ITER. This includes the ECCD (inside the q=1 radius) stabilization of the NTM “seeding” instability known as sawteeth (m/n=1/1) [5]. Recent measurements in DIII-D show that the ITER-similar current profile is classically unstable, curvature stabilization must not be neglected, and the small island width stabilization effect from helical ion polarization currents is stronger than was previously thought [6]. The consequences of updated assumptions in ITER modeling of the minimum well-aligned ECCD power needed are all-in-all favorable (and well-within the ITER 24 gyrotron capability) when all effects are included. However, a “wild card” may be broadening of the localized ECCD by the presence of the island; various theories predict broadening could occur and there is experimental evidence for broadening in DIII-D. Wider than now expected ECCD in ITER would make alignment easier to do but weaken the stabilization and thus require more rf power. In addition to updated modeling for ITER, advances in the ITER-relevant DIII-D ECCD gyrotron launch mirror control system hardware and real-time plasma control system have been made [7] and there are plans for application in DIII-D ITER demonstration discharges.« less
A fresh look at electron cyclotron current drive power requirements for stabilization of tearing modes in ITER

NASA Astrophysics Data System (ADS)

La Haye, R. J.

2015-12-01

ITER is an international project to design and build an experimental fusion reactor based on the "tokamak" concept. ITER relies upon localized electron cyclotron current drive (ECCD) at the rational safety factor q=2 to suppress or stabilize the expected poloidal mode m=2, toroidal mode n=1 neoclassical tearing mode (NTM) islands. Such islands if unmitigated degrade energy confinement, lock to the resistive wall (stop rotating), cause loss of "H-mode" and induce disruption. The International Tokamak Physics Activity (ITPA) on MHD, Disruptions and Magnetic Control joint experiment group MDC-8 on Current Drive Prevention/Stabilization of Neoclassical Tearing Modes started in 2005, after which assessments were made for the requirements for ECCD needed in ITER, particularly that of rf power and alignment on q=2 [1]. Narrow well-aligned rf current parallel to and of order of one percent of the total plasma current is needed to replace the "missing" current in the island O-points and heal or preempt (avoid destabilization by applying ECCD on q=2 in absence of the mode) the island [2-4]. This paper updates the advances in ECCD stabilization on NTMs learned in DIII-D experiments and modeling during the last 5 to 10 years as applies to stabilization by localized ECCD of tearing modes in ITER. This includes the ECCD (inside the q=1 radius) stabilization of the NTM "seeding" instability known as sawteeth (m/n=1/1) [5]. Recent measurements in DIII-D show that the ITER-similar current profile is classically unstable, curvature stabilization must not be neglected, and the small island width stabilization effect from helical ion polarization currents is stronger than was previously thought [6]. The consequences of updated assumptions in ITER modeling of the minimum well-aligned ECCD power needed are all-in-all favorable (and well-within the ITER 24 gyrotron capability) when all effects are included. However, a "wild card" may be broadening of the localized ECCD by the presence of the island; various theories predict broadening could occur and there is experimental evidence for broadening in DIII-D. Wider than now expected ECCD in ITER would make alignment easier to do but weaken the stabilization and thus require more rf power. In addition to updated modeling for ITER, advances in the ITER-relevant DIII-D ECCD gyrotron launch mirror control system hardware and real-time plasma control system have been made [7] and there are plans for application in DIII-D ITER demonstration discharges.
Stepwise Iterative Fourier Transform: The SIFT

NASA Technical Reports Server (NTRS)

Benignus, V. A.; Benignus, G.

1975-01-01

A program, designed specifically to study the respective effects of some common data problems on results obtained through stepwise iterative Fourier transformation of synthetic data with known waveform composition, was outlined. Included in this group were the problems of gaps in the data, different time-series lengths, periodic but nonsinusoidal waveforms, and noisy (low signal-to-noise) data. Results on sinusoidal data were also compared with results obtained on narrow band noise with similar characteristics. The findings showed that the analytic procedure under study can reliably reduce data in the nature of (1) sinusoids in noise, (2) asymmetric but periodic waves in noise, and (3) sinusoids in noise with substantial gaps in the data. The program was also able to analyze narrow-band noise well, but with increased interpretational problems. The procedure was shown to be a powerful technique for analysis of periodicities, in comparison with classical spectrum analysis techniques. However, informed use of the stepwise procedure nevertheless requires some background of knowledge concerning characteristics of the biological processes under study.
Comparison between iteration schemes for three-dimensional coordinate-transformed saturated-unsaturated flow model

NASA Astrophysics Data System (ADS)

An, Hyunuk; Ichikawa, Yutaka; Tachikawa, Yasuto; Shiiba, Michiharu

2012-11-01

SummaryThree different iteration methods for a three-dimensional coordinate-transformed saturated-unsaturated flow model are compared in this study. The Picard and Newton iteration methods are the common approaches for solving Richards' equation. The Picard method is simple to implement and cost-efficient (on an individual iteration basis). However it converges slower than the Newton method. On the other hand, although the Newton method converges faster, it is more complex to implement and consumes more CPU resources per iteration than the Picard method. The comparison of the two methods in finite-element model (FEM) for saturated-unsaturated flow has been well evaluated in previous studies. However, two iteration methods might exhibit different behavior in the coordinate-transformed finite-difference model (FDM). In addition, the Newton-Krylov method could be a suitable alternative for the coordinate-transformed FDM because it requires the evaluation of a 19-point stencil matrix. The formation of a 19-point stencil is quite a complex and laborious procedure. Instead, the Newton-Krylov method calculates the matrix-vector product, which can be easily approximated by calculating the differences of the original nonlinear function. In this respect, the Newton-Krylov method might be the most appropriate iteration method for coordinate-transformed FDM. However, this method involves the additional cost of taking an approximation at each Krylov iteration in the Newton-Krylov method. In this paper, we evaluated the efficiency and robustness of three iteration methods—the Picard, Newton, and Newton-Krylov methods—for simulating saturated-unsaturated flow through porous media using a three-dimensional coordinate-transformed FDM.
77 FR 50012 - Standard Instrument Approach Procedures, and Takeoff Minimums and Obstacle Departure Procedures...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-20

..., Amdt 2 Oklahoma City, OK, Will Rogers World, Takeoff Minimums and Obstacle DP, Amdt 1 Portland, OR... CLOSE PARALLEL), Amdt 4, CANCELED Philadelphia, PA, Philadelphia Intl, ILS PRM RWY 27L (SIMULTANEOUS CLOSE PARALLEL), Amdt 3, CANCELED Pittsburgh, PA, Allegheny County, ILS OR LOC RWY 10, Amdt 6 Pittsburgh...
Advances in Global Full Waveform Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lei, W.; Ruan, Y.; Lefebvre, M. P.; Modrak, R. T.; Orsvuran, R.; Smith, J. A.; Komatitsch, D.; Peter, D. B.

2017-12-01

Information about Earth's interior comes from seismograms recorded at its surface. Seismic imaging based on spectral-element and adjoint methods has enabled assimilation of this information for the construction of 3D (an)elastic Earth models. These methods account for the physics of wave excitation and propagation by numerically solving the equations of motion, and require the execution of complex computational procedures that challenge the most advanced high-performance computing systems. Current research is petascale; future research will require exascale capabilities. The inverse problem consists of reconstructing the characteristics of the medium from -often noisy- observations. A nonlinear functional is minimized, which involves both the misfit to the measurements and a Tikhonov-type regularization term to tackle inherent ill-posedness. Achieving scalability for the inversion process on tens of thousands of multicore processors is a task that offers many research challenges. We initiated global "adjoint tomography" using 253 earthquakes and produced the first-generation model named GLAD-M15, with a transversely isotropic model parameterization. We are currently running iterations for a second-generation anisotropic model based on the same 253 events. In parallel, we continue iterations for a transversely isotropic model with a larger dataset of 1,040 events to determine higher-resolution plume and slab images. A significant part of our research has focused on eliminating I/O bottlenecks in the adjoint tomography workflow. This has led to the development of a new Adaptable Seismic Data Format based on HDF5, and post-processing tools based on the ADIOS library developed by Oak Ridge National Laboratory. We use the Ensemble Toolkit for workflow stabilization & management to automate the workflow with minimal human interaction.

Improved pressure-velocity coupling algorithm based on minimization of global residual norm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatwani, A.U.; Turan, A.

1991-01-01

In this paper an improved pressure velocity coupling algorithm is proposed based on the minimization of the global residual norm. The procedure is applied to SIMPLE and SIMPLEC algorithms to automatically select the pressure underrelaxation factor to minimize the global residual norm at each iteration level. Test computations for three-dimensional turbulent, isothermal flow is a toroidal vortex combustor indicate that velocity underrelaxation factors as high as 0.7 can be used to obtain a converged solution in 300 iterations.
Terminal Area Procedures for Paired Runways

NASA Technical Reports Server (NTRS)

Lozito, Sandra; Verma, Savita Arora

2011-01-01

Parallel runway operations have been found to increase capacity within the National Airspace but poor visibility conditions reduce the use of these operations. The NextGen and SESAR Programs have identified the capacity benefits from increased use of closely-space parallel runway. Previous research examined the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This simulation study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: two levels of flight deck automation (current-day flight deck automation and auto speed control future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Results show the operations in this study were acceptable and safe. Subjective workload, when using the pairing procedures and tools, was generally low for both controllers and pilots, and situation awareness was typically moderate to high. Pilot workload was influenced by display type and automation condition. Further research on pairing and off-nominal conditions is required however, this investigation identified promising findings about the feasibility of closely-spaced parallel runway operations.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Scaling Optimization of the SIESTA MHD Code

NASA Astrophysics Data System (ADS)

Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

2013-10-01

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Symmetry Breaking by Parallel Flow Shear

NASA Astrophysics Data System (ADS)

Li, Jiacong; Diamond, Patrick

2015-11-01

Plasma rotation is important in reducing turbulent transport, suppressing MHD instabilities, and is beneficial to confinement. Intrinsic rotation without an external momentum input is of interest for its plausible application on ITER. k∥ spectrum asymmetry is required for residual Reynolds stress that drives the intrinsic rotation. Parallel flows are reported in linear devices without magnetic shear. In CSDX, parallel flows are mostly peaked in the core [Thakur et al., 2014]; more robust flows and reversed profiles are seen in PANTA [Oldenburger, et al. 2012]. A novel mechanism for symmetry breaking in momentum transport is proposed. Magnetic shear or mean flow profile are not required. A seed parallel flow shear (PFS) sets the sign of residual stress by selecting certain modes to grow faster. The resulted spectrum imbalance leads to a nonzero residual stress, which further drives a parallel flow with ∇n as the free energy source, adding to the shear until saturated by diffusion. Balanced flow gradient is set by Π∥Res /χϕ . Residual stress is calculated for ITG turbulence and collisional drift wave turbulence where electron-ion and electron-neutral collisions are discussed and compared. Numerical simulation is proposed for testing the effect of PFS.
A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

NASA Astrophysics Data System (ADS)

Ma, Sangback

In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Convergence characteristics of nonlinear vortex-lattice methods for configuration aerodynamics

NASA Technical Reports Server (NTRS)

Seginer, A.; Rusak, Z.; Wasserstrom, E.

1983-01-01

Nonlinear panel methods have no proof for the existence and uniqueness of their solutions. The convergence characteristics of an iterative, nonlinear vortex-lattice method are, therefore, carefully investigated. The effects of several parameters, including (1) the surface-paneling method, (2) an integration method of the trajectories of the wake vortices, (3) vortex-grid refinement, and (4) the initial conditions for the first iteration on the computed aerodynamic coefficients and on the flow-field details are presented. The convergence of the iterative-solution procedure is usually rapid. The solution converges with grid refinement to a constant value, but the final value is not unique and varies with the wing surface-paneling and wake-discretization methods within some range in the vicinity of the experimental result.
Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

NASA Astrophysics Data System (ADS)

Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

2014-06-01

We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.
VINE: A Variational Inference -Based Bayesian Neural Network Engine

DTIC Science & Technology

2018-01-01

networks are trained using the same dataset and hyper parameter settings as discussed. Table 1 Performance evaluation of the proposed transfer learning...multiplication/addition/subtraction. These operations can be implemented using nested loops in which various iterations of a loop are independent of...each other. This introduces an opportunity for optimization where a loop may be unrolled fully or partially to increase parallelism at the cost of
Parallel Worlds: Agile and Waterfall Differences and Similarities

DTIC Science & Technology

2013-10-01

development model , and it is deliberately shorter than the Agile Overview as most readers are assumed to be from the Traditional World. For a more in...process of DODI 5000 does not forbid the iterative incremental software development model with frequent end-user interaction, it requires heroics on...added). Today, many of the DOD’s large IT programs therefore continue to adopt program structures and software development models closely
Terascale Optimal PDE Simulations (TOPS) Center

DOE Office of Scientific and Technical Information (OSTI.GOV)

Professor Olof B. Widlund

2007-07-09

Our work has focused on the development and analysis of domain decomposition algorithms for a variety of problems arising in continuum mechanics modeling. In particular, we have extended and analyzed FETI-DP and BDDC algorithms; these iterative solvers were first introduced and studied by Charbel Farhat and his collaborators, see [11, 45, 12], and by Clark Dohrmann of SANDIA, Albuquerque, see [43, 2, 1], respectively. These two closely related families of methods are of particular interest since they are used more extensively than other iterative substructuring methods to solve very large and difficult problems. Thus, the FETI algorithms are part ofmore » the SALINAS system developed by the SANDIA National Laboratories for very large scale computations, and as already noted, BDDC was first developed by a SANDIA scientist, Dr. Clark Dohrmann. The FETI algorithms are also making inroads in commercial engineering software systems. We also note that the analysis of these algorithms poses very real mathematical challenges. The success in developing this theory has, in several instances, led to significant improvements in the performance of these algorithms. A very desirable feature of these iterative substructuring and other domain decomposition algorithms is that they respect the memory hierarchy of modern parallel and distributed computing systems, which is essential for approaching peak floating point performance. The development of improved methods, together with more powerful computer systems, is making it possible to carry out simulations in three dimensions, with quite high resolution, relatively easily. This work is supported by high quality software systems, such as Argonne's PETSc library, which facilitates code development as well as the access to a variety of parallel and distributed computer systems. The success in finding scalable and robust domain decomposition algorithms for very large number of processors and very large finite element problems is, e.g., illustrated in [24, 25, 26]. This work is based on [29, 31]. Our work over these five and half years has, in our opinion, helped advance the knowledge of domain decomposition methods significantly. We see these methods as providing valuable alternatives to other iterative methods, in particular, those based on multi-grid. In our opinion, our accomplishments also match the goals of the TOPS project quite closely.« less
75 FR 55963 - Standard Instrument Approach Procedures, and Takeoff Minimums and Obstacle Departure Procedures...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-15

... World, RNAV (RNP) Y RWY 17L, Amdt 1A Oklahoma City, OK, Will Rogers World, RNAV (RNP) Z RWY 35R, Orig-B... PRM RWY 8L (CAT III), Orig-B (Simultaneous Close Parallel) Atlanta, GA, Hartsfield-Jackson Atlanta Intl, ILS PRM RWY 8R, (Simultaneous Close Parallel), Orig-A [[Page 55965
Prospects for Advanced Tokamak Operation of ITER

NASA Astrophysics Data System (ADS)

Neilson, George H.

1996-11-01

Previous studies have identified steady-state (or "advanced") modes for ITER, based on reverse-shear profiles and significant bootstrap current. A typical example has 12 MA of plasma current, 1,500 MW of fusion power, and 100 MW of heating and current-drive power. The implementation of these and other steady-state operating scenarios in the ITER device is examined in order to identify key design modifications that can enhance the prospects for successfully achieving advanced tokamak operating modes in ITER compatible with a single null divertor design. In particular, we examine plasma configurations that can be achieved by the ITER poloidal field system with either a monolithic central solenoid (as in the ITER Interim Design), or an alternate "hybrid" central solenoid design which provides for greater flexibility in the plasma shape. The increased control capability and expanded operating space provided by the hybrid central solenoid allows operation at high triangularity (beneficial for improving divertor performance through control of edge-localized modes and for increasing beta limits), and will make it much easier for ITER operators to establish an optimum startup trajectory leading to a high-performance, steady-state scenario. Vertical position control is examined because plasmas made accessible by the hybrid central solenoid can be more elongated and/or less well coupled to the conducting structure. Control of vertical-displacements using the external PF coils remains feasible over much of the expanded operating space. Further work is required to define the full spectrum of axisymmetric plasma disturbances requiring active control In addition to active axisymmetric control, advanced tokamak modes in ITER may require active control of kink modes on the resistive time scale of the conducting structure. This might be accomplished in ITER through the use of active control coils external to the vacuum vessel which are actuated by magnetic sensors near the first wall. The enhanced shaping and positioning flexibility provides a range of options for reducing the ripple-induced losses of fast alpha particles--a major limitation on ITER steady-state modes. An alternate approach that we are pursuing in parallel is the inclusion of ferromagnetic inserts to reduce the toroidal field ripple within the plasma chamber. The inclusion of modest design changes such as the hybrid central solenoid, active control coils for kink modes, and ferromagnetic inserts for TF ripple reduction show can greatly increase the flexibility to accommodate advance tokamak operation in ITER. Increased flexibility is important because the optimum operating scenario for ITER cannot be predicted with certainty. While low-inductance, reverse shear modes appear attractive for steady-state operation, high-inductance, high-beta modes are also viable candidates, and it is important that ITER have the flexibility to explore both these, and other, operating regimes.
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

NASA Astrophysics Data System (ADS)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE PAGES

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...

2017-06-29

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Performance of a parallel thermal-hydraulics code TEMPEST

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fann, G.I.; Trent, D.S.

The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scalingmore » performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.« less
Nomarski differential interference contrast microscopy for surface slope measurements: an examination of techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hartman, J.S.; Gordon, R.L.; Lessor, D.L.

1981-08-01

Alternate measurement and data analysis procedures are discussed and compared for the application of reflective Nomarski differential interference contrast microscopy for the determination of surface slopes. The discussion includes the interpretation of a previously reported iterative procedure using the results of a detailed optical model and the presentation of a new procedure based on measured image intensity extrema. Surface slope determinations from these procedures are presented and compared with results from a previously reported curve fit analysis of image intensity data. The accuracy and advantages of the different procedures are discussed.
A comparative study of coarse-graining methods for polymeric fluids: Mori-Zwanzig vs. iterative Boltzmann inversion vs. stochastic parametric optimization

NASA Astrophysics Data System (ADS)

Li, Zhen; Bian, Xin; Yang, Xiu; Karniadakis, George Em

2016-07-01

We construct effective coarse-grained (CG) models for polymeric fluids by employing two coarse-graining strategies. The first one is a forward-coarse-graining procedure by the Mori-Zwanzig (MZ) projection while the other one applies a reverse-coarse-graining procedure, such as the iterative Boltzmann inversion (IBI) and the stochastic parametric optimization (SPO). More specifically, we perform molecular dynamics (MD) simulations of star polymer melts to provide the atomistic fields to be coarse-grained. Each molecule of a star polymer with internal degrees of freedom is coarsened into a single CG particle and the effective interactions between CG particles can be either evaluated directly from microscopic dynamics based on the MZ formalism, or obtained by the reverse methods, i.e., IBI and SPO. The forward procedure has no free parameters to tune and recovers the MD system faithfully. For the reverse procedure, we find that the parameters in CG models cannot be selected arbitrarily. If the free parameters are properly defined, the reverse CG procedure also yields an accurate effective potential. Moreover, we explain how an aggressive coarse-graining procedure introduces the many-body effect, which makes the pairwise potential invalid for the same system at densities away from the training point. From this work, general guidelines for coarse-graining of polymeric fluids can be drawn.
A comparative study of coarse-graining methods for polymeric fluids: Mori-Zwanzig vs. iterative Boltzmann inversion vs. stochastic parametric optimization.

PubMed

Li, Zhen; Bian, Xin; Yang, Xiu; Karniadakis, George Em

2016-07-28

We construct effective coarse-grained (CG) models for polymeric fluids by employing two coarse-graining strategies. The first one is a forward-coarse-graining procedure by the Mori-Zwanzig (MZ) projection while the other one applies a reverse-coarse-graining procedure, such as the iterative Boltzmann inversion (IBI) and the stochastic parametric optimization (SPO). More specifically, we perform molecular dynamics (MD) simulations of star polymer melts to provide the atomistic fields to be coarse-grained. Each molecule of a star polymer with internal degrees of freedom is coarsened into a single CG particle and the effective interactions between CG particles can be either evaluated directly from microscopic dynamics based on the MZ formalism, or obtained by the reverse methods, i.e., IBI and SPO. The forward procedure has no free parameters to tune and recovers the MD system faithfully. For the reverse procedure, we find that the parameters in CG models cannot be selected arbitrarily. If the free parameters are properly defined, the reverse CG procedure also yields an accurate effective potential. Moreover, we explain how an aggressive coarse-graining procedure introduces the many-body effect, which makes the pairwise potential invalid for the same system at densities away from the training point. From this work, general guidelines for coarse-graining of polymeric fluids can be drawn.

A modified interval symmetric single step procedure ISS-5D for simultaneous inclusion of polynomial zeros

NASA Astrophysics Data System (ADS)

Sham, Atiyah W. M.; Monsi, Mansor; Hassan, Nasruddin; Suleiman, Mohamed

2013-04-01

The aim of this paper is to present a new modified interval symmetric single-step procedure ISS-5D which is the extension from the previous procedure, ISS1. The ISS-5D method will produce successively smaller intervals that are guaranteed to still contain the zeros. The efficiency of this method is measured on the CPU times and the number of iteration. The procedure is run on five test polynomials and the results obtained are shown in this paper.
CGHnormaliter: an iterative strategy to enhance normalization of array CGH data with imbalanced aberrations

PubMed Central

van Houte, Bart PP; Binsl, Thomas W; Hettling, Hannes; Pirovano, Walter; Heringa, Jaap

2009-01-01

Background Array comparative genomic hybridization (aCGH) is a popular technique for detection of genomic copy number imbalances. These play a critical role in the onset of various types of cancer. In the analysis of aCGH data, normalization is deemed a critical pre-processing step. In general, aCGH normalization approaches are similar to those used for gene expression data, albeit both data-types differ inherently. A particular problem with aCGH data is that imbalanced copy numbers lead to improper normalization using conventional methods. Results In this study we present a novel method, called CGHnormaliter, which addresses this issue by means of an iterative normalization procedure. First, provisory balanced copy numbers are identified and subsequently used for normalization. These two steps are then iterated to refine the normalization. We tested our method on three well-studied tumor-related aCGH datasets with experimentally confirmed copy numbers. Results were compared to a conventional normalization approach and two more recent state-of-the-art aCGH normalization strategies. Our findings show that, compared to these three methods, CGHnormaliter yields a higher specificity and precision in terms of identifying the 'true' copy numbers. Conclusion We demonstrate that the normalization of aCGH data can be significantly enhanced using an iterative procedure that effectively eliminates the effect of imbalanced copy numbers. This also leads to a more reliable assessment of aberrations. An R-package containing the implementation of CGHnormaliter is available at . PMID:19709427
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
A Simple Classroom Simulation of Heat Energy Diffusing through a Metal Bar

ERIC Educational Resources Information Center

Kinsler, Mark; Kinzel, Evelyn

2007-01-01

We present an iterative procedure that does not rely on calculus to model heat flow through a uniform bar of metal and thus avoids the use of the partial differential equation typically needed to describe heat diffusion. The procedure is based on first principles and can be done with students at the blackboard. It results in a plot that…
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Development of a needle driver with multiple degrees of freedom for neonatal laparoscopic surgery.

PubMed

Ishimaru, Tetsuya; Takazawa, Shinya; Uchida, Hiroo; Kawashima, Hiroshi; Fujii, Masahiro; Harada, Kanako; Sugita, Naohiko; Mitsuishi, Mamoru; Iwanaka, Tadashi

2013-07-01

The aims of this study were to develop a thin needle driver with multiple degrees of freedom and to evaluate its efficacy in multidirectional suturing compared with a conventional needle driver. The tip (15 mm) of the novel user-friendly needle driver (3.5 mm in diameter) has three degrees of freedom for grasping, rotation, and deflection. Six pediatric surgeons performed two kinds of suturing tasks in a dry box: three stitches in continuous suturing that were perpendicular or parallel to the insertion direction of the instrument, first using the novel instrument, then using a conventional instrument, and finally using the novel instrument again. The accuracy of insertion and exit compared with the target points and the procedure time were measured. In the conventional and novel procedures the mean gaps from the insertion point to the target in perpendicular suturing were 0.8 mm and 0.7 mm, respectively; in parallel suturing they were 0.8 mm and 0.6 mm, respectively. The mean gaps from the exit point to the target in perpendicular suturing were 0.6 mm and 0.6 mm for conventional and novel procedures, respectively; in parallel suturing they were 0.6 mm and 0.8 mm, respectively. The procedure time for perpendicular suturing was 33 seconds and 64 seconds for conventional and novel procedures, respectively (P=.02); for parallel suturing it was 114 seconds and 91 seconds, respectively. Our novel needle driver maintained accuracy of suturing; parallel suturing with the novel driver may be easier than with the conventional one.
THERMAL DESIGN OF THE ITER VACUUM VESSEL COOLING SYSTEM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carbajo, Juan J; Yoder Jr, Graydon L; Kim, Seokho H

RELAP5-3D models of the ITER Vacuum Vessel (VV) Primary Heat Transfer System (PHTS) have been developed. The design of the cooling system is described in detail, and RELAP5 results are presented. Two parallel pump/heat exchanger trains comprise the design one train is for full-power operation and the other is for emergency operation or operation at decay heat levels. All the components are located inside the Tokamak building (a significant change from the original configurations). The results presented include operation at full power, decay heat operation, and baking operation. The RELAP5-3D results confirm that the design can operate satisfactorily during bothmore » normal pulsed power operation and decay heat operation. All the temperatures in the coolant and in the different system components are maintained within acceptable operating limits.« less
Hybrid cloud and cluster computing paradigms for life science applications

PubMed Central

2010-01-01

Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.

PubMed

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

2010-12-21

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Conceptual study of the cryocascade for pumping, separation and recycling of ITER torus exhaust

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mack, A.; Perinic, D.

1994-12-31

For pumping, separation and recycling of the ITER plasma exhaust, a pumping system working reliably under ambient and operating conditions of the ITER reactor is required. The pump is exposed to a magnetic field of about 3 T and has to be resistant to radioactive irradiation of 10{sup 9} rad. In the burn and dwell phase, a gas mixture consisting of hydrogen isotopes, helium ash and impurities has to be pumped at the pressure range of 10{sup {minus}2} - 10{sup {minus}1} mbar. Within the framework of the European Fusion Technology Programme, the concept of a primary cryopump for use inmore » ITER is being prepared at KfK. The cryocascade concept is planned to include three pump stages. These pump stages, which are connected in series, consist of individual chambers that may be separated from each other by means of cold valves. In the first stage, the impurities of the reactor exhaust gas are frozen out at 20-30 K. Settling of the hydrogen isotopes H/D/T on the 5 K cryosurfaces takes place in the second stage. This stage is made up of two parallel chambers, which can be switched from the pumping to the regeneration mode or vice versa. The helium fraction is bound in the downstream 5 K adsorption stage.« less
Increasing feasibility of the field-programmable gate array implementation of an iterative image registration using a kernel-warping algorithm

NASA Astrophysics Data System (ADS)

Nguyen, An Hung; Guillemette, Thomas; Lambert, Andrew J.; Pickering, Mark R.; Garratt, Matthew A.

2017-09-01

Image registration is a fundamental image processing technique. It is used to spatially align two or more images that have been captured at different times, from different sensors, or from different viewpoints. There have been many algorithms proposed for this task. The most common of these being the well-known Lucas-Kanade (LK) and Horn-Schunck approaches. However, the main limitation of these approaches is the computational complexity required to implement the large number of iterations necessary for successful alignment of the images. Previously, a multi-pass image interpolation algorithm (MP-I2A) was developed to considerably reduce the number of iterations required for successful registration compared with the LK algorithm. This paper develops a kernel-warping algorithm (KWA), a modified version of the MP-I2A, which requires fewer iterations to successfully register two images and less memory space for the field-programmable gate array (FPGA) implementation than the MP-I2A. These reductions increase feasibility of the implementation of the proposed algorithm on FPGAs with very limited memory space and other hardware resources. A two-FPGA system rather than single FPGA system is successfully developed to implement the KWA in order to compensate insufficiency of hardware resources supported by one FPGA, and increase parallel processing ability and scalability of the system.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
Task 7: ADPAC User's Manual

NASA Technical Reports Server (NTRS)

Hall, E. J.; Topp, D. A.; Delaney, R. A.

1996-01-01

The overall objective of this study was to develop a 3-D numerical analysis for compressor casing treatment flowfields. The current version of the computer code resulting from this study is referred to as ADPAC (Advanced Ducted Propfan Analysis Codes-Version 7). This report is intended to serve as a computer program user's manual for the ADPAC code developed under Tasks 6 and 7 of the NASA Contract. The ADPAC program is based on a flexible multiple- block grid discretization scheme permitting coupled 2-D/3-D mesh block solutions with application to a wide variety of geometries. Aerodynamic calculations are based on a four-stage Runge-Kutta time-marching finite volume solution technique with added numerical dissipation. Steady flow predictions are accelerated by a multigrid procedure. An iterative implicit algorithm is available for rapid time-dependent flow calculations, and an advanced two equation turbulence model is incorporated to predict complex turbulent flows. The consolidated code generated during this study is capable of executing in either a serial or parallel computing mode from a single source code. Numerous examples are given in the form of test cases to demonstrate the utility of this approach for predicting the aerodynamics of modem turbomachinery configurations.
Fast simulation techniques for switching converters

NASA Technical Reports Server (NTRS)

King, Roger J.

1987-01-01

Techniques for simulating a switching converter are examined. The state equations for the equivalent circuits, which represent the switching converter, are presented and explained. The uses of the Newton-Raphson iteration, low ripple approximation, half-cycle symmetry, and discrete time equations to compute the interval durations are described. An example is presented in which these methods are illustrated by applying them to a parallel-loaded resonant inverter with three equivalent circuits for its continuous mode of operation.
Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

NASA Technical Reports Server (NTRS)

Tilton, James C.

2003-01-01

A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic region growing.
Rapid alignment of nanotomography data using joint iterative reconstruction and reprojection.

PubMed

Gürsoy, Doğa; Hong, Young P; He, Kuan; Hujsak, Karl; Yoo, Seunghwan; Chen, Si; Li, Yue; Ge, Mingyuan; Miller, Lisa M; Chu, Yong S; De Andrade, Vincent; He, Kai; Cossairt, Oliver; Katsaggelos, Aggelos K; Jacobsen, Chris

2017-09-18

As x-ray and electron tomography is pushed further into the nanoscale, the limitations of rotation stages become more apparent, leading to challenges in the alignment of the acquired projection images. Here we present an approach for rapid post-acquisition alignment of these projections to obtain high quality three-dimensional images. Our approach is based on a joint estimation of alignment errors, and the object, using an iterative refinement procedure. With simulated data where we know the alignment error of each projection image, our approach shows a residual alignment error that is a factor of a thousand smaller, and it reaches the same error level in the reconstructed image in less than half the number of iterations. We then show its application to experimental data in x-ray and electron nanotomography.
An iterative requirements specification procedure for decision support systems.

PubMed

Brookes, C H

1987-08-01

Requirements specification is a key element in a DSS development project because it not only determines what is to be done, it also drives the evolution process. A procedure for requirements elicitation is described that is based on the decomposition of the DSS design task into a number of functions, subfunctions, and operators. It is postulated that the procedure facilitates the building of a DSS that is complete and integrates MIS, modelling and expert system components. Some examples given are drawn from the health administration field.
Restoration of multichannel microwave radiometric images

NASA Technical Reports Server (NTRS)

Chin, R. T.; Yeh, C. L.; Olson, W. S.

1983-01-01

A constrained iterative image restoration method is applied to multichannel diffraction-limited imagery. This method is based on the Gerchberg-Papoulis algorithm utilizing incomplete information and partial constraints. The procedure is described using the orthogonal projection operators which project onto two prescribed subspaces iteratively. Some of its properties and limitations are also presented. The selection of appropriate constraints was emphasized in a practical application. Multichannel microwave images, each having different spatial resolution, were restored to a common highest resolution to demonstrate the effectiveness of the method. Both noise-free and noisy images were used in this investigation.
Implementation on a nonlinear concrete cracking algorithm in NASTRAN

NASA Technical Reports Server (NTRS)

Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.; Chang, H.

1976-01-01

A computer code for the analysis of reinforced concrete structures was developed using NASTRAN as a basis. Nonlinear iteration procedures were developed for obtaining solutions with a wide variety of loading sequences. A direct access file system was used to save results at each load step to restart within the solution module for further analysis. A multi-nested looping capability was implemented to control the iterations and change the loads. The basis for the analysis is a set of mutli-layer plate elements which allow local definition of materials and cracking properties.

A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
40 CFR 230.5 - General procedures to be followed.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Evaluation and Testing (§ 230.61). (j) Identify appropriate and practicable changes to the project plan to... of illustration. The actual process followed may be iterative, with the results of one step leading...
40 CFR 230.5 - General procedures to be followed.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Evaluation and Testing (§ 230.61). (j) Identify appropriate and practicable changes to the project plan to... of illustration. The actual process followed may be iterative, with the results of one step leading...
Terminal Area Procedures for Paired Runways

NASA Technical Reports Server (NTRS)

Lozito, Sandy

2011-01-01

Parallel Runway operations have been found to increase capacity within the National Airspace (NAS) however, poor visibility conditions reduce this capacity [1]. Much research has been conducted to examine the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s(+/- 10s error) at a coupling point that is about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: Two levels of flight deck automation (current-day flight deck automation, and a prototype future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Data showed that the operations in this study were found to be acceptable and safe. Workload when using the pairing procedures and tools was generally low for both controllers and pilots, and situation awareness (SA) was typically moderate to high. There were some differences based upon the display and automation conditions for the pilots. Future research should consider the refinement of the concepts and tools for pilot and controller displays and automation for parallel runway concepts.
Estimating the Minimum Number of Judges Required for Test-Centred Standard Setting on Written Assessments. Do Discussion and Iteration Have an Influence?

ERIC Educational Resources Information Center

Fowell, S. L.; Fewtrell, R.; McLaughlin, P. J.

2008-01-01

Absolute standard setting procedures are recommended for assessment in medical education. Absolute, test-centred standard setting procedures were introduced for written assessments in the Liverpool MBChB in 2001. The modified Angoff and Ebel methods have been used for short answer question-based and extended matching question-based papers,…
Low-authority control synthesis for large space structures

NASA Technical Reports Server (NTRS)

Aubrun, J. N.; Margulies, G.

1982-01-01

The control of vibrations of large space structures by distributed sensors and actuators is studied. A procedure is developed for calculating the feedback loop gains required to achieve specified amounts of damping. For moderate damping (Low Authority Control) the procedure is purely algebraic, but it can be applied iteratively when larger amounts of damping are required and is generalized for arbitrary time invariant systems.
An asymptotic-preserving Lagrangian algorithm for the time-dependent anisotropic heat transport equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.

2014-09-01

We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
Interdisciplinary Development of an Improved Emergency Department Procedural Work Surface Through Iterative Design and Use Testing in Simulated and Clinical Environments.

PubMed

Zhang, Xiao C; Bermudez, Ana M; Reddy, Pranav M; Sarpatwari, Ravi R; Chheng, Darin B; Mezoian, Taylor J; Schwartz, Victoria R; Simmons, Quinneil J; Jay, Gregory D; Kobayashi, Leo

2017-03-01

A stable and readily accessible work surface for bedside medical procedures represents a valuable tool for acute care providers. In emergency department (ED) settings, the design and implementation of traditional Mayo stands and related surface devices often limit their availability, portability, and usability, which can lead to suboptimal clinical practice conditions that may affect the safe and effective performance of medical procedures and delivery of patient care. We designed and built a novel, open-source, portable, bedside procedural surface through an iterative development process with use testing in simulated and live clinical environments. The procedural surface development project was conducted between October 2014 and June 2016 at an academic referral hospital and its affiliated simulation facility. An interdisciplinary team of emergency physicians, mechanical engineers, medical students, and design students sought to construct a prototype bedside procedural surface out of off-the-shelf hardware during a collaborative university course on health care design. After determination of end-user needs and core design requirements, multiple prototypes were fabricated and iteratively modified, with early variants featuring undermattress stabilizing supports or ratcheting clamp mechanisms. Versions 1 through 4 underwent 2 hands-on usability-testing simulation sessions; version 5 was presented at a design critique held jointly by a panel of clinical and industrial design faculty for expert feedback. Responding to select feedback elements over several surface versions, investigators arrived at a near-final prototype design for fabrication and use testing in a live clinical setting. This experimental procedural surface (version 8) was constructed and then deployed for controlled usability testing against the standard Mayo stands in use at the study site ED. Clinical providers working in the ED who opted to participate in the study were provided with the prototype surface and just-in-time training on its use when performing bedside procedures. Subjects completed the validated 10-point System Usability Scale postshift for the surface that they had used. The study protocol was approved by the institutional review board. Multiple prototypes and recursive design revisions resulted in a fully functional, portable, and durable bedside procedural surface that featured a stainless steel tray and intuitive hook-and-lock mechanisms for attachment to ED stretcher bed rails. Forty-two control and 40 experimental group subjects participated and completed questionnaires. The median System Usability Scale score (out of 100; higher scores associated with better usability) was 72.5 (interquartile range [IQR] 51.3 to 86.3) for the Mayo stand; the experimental surface was scored at 93.8 (IQR 84.4 to 97.5 for a difference in medians of 17.5 (95% confidence interval 10 to 27.5). Subjects reported several usability challenges with the Mayo stand; the experimental surface was reviewed as easy to use, simple, and functional. In accordance with experimental live environment deployment, questionnaire responses, and end-user suggestions, the project team finalized the design specification for the experimental procedural surface for open dissemination. An iterative, interdisciplinary approach was used to generate, evaluate, revise, and finalize the design specification for a new procedural surface that met all core end-user requirements. The final surface design was evaluated favorably on a validated usability tool against Mayo stands when use tested in simulated and live clinical settings. Copyright © 2016 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.
A new design approach to innovative spectrometers. Case study: TROPOLITE

NASA Astrophysics Data System (ADS)

Volatier, Jean-Baptiste; Baümer, Stefan; Kruizinga, Bob; Vink, Rob

2014-05-01

Designing a novel optical system is a nested iterative process. The optimization loop, from a starting point to final system is already mostly automated. However this loop is part of a wider loop which is not. This wider loop starts with an optical specification and ends with a manufacturability assessment. When designing a new spectrometer with emphasis on weight and cost, numerous iterations between the optical- and mechanical designer are inevitable. The optical designer must then be able to reliably produce optical designs based on new input gained from multidisciplinary studies. This paper presents a procedure that can automatically generate new starting points based on any kind of input or new constraint that might arise. These starting points can then be handed over to a generic optimization routine to make the design tasks extremely efficient. The optical designer job is then not to design optical systems, but to meta-design a procedure that produces optical systems paving the way for system level optimization. We present here this procedure and its application to the design of TROPOLITE a lightweight push broom imaging spectrometer.
Optimization applications in aircraft engine design and test

NASA Technical Reports Server (NTRS)

Pratt, T. K.

1984-01-01

Starting with the NASA-sponsored STAEBL program, optimization methods based primarily upon the versatile program COPES/CONMIN were introduced over the past few years to a broad spectrum of engineering problems in structural optimization, engine design, engine test, and more recently, manufacturing processes. By automating design and testing processes, many repetitive and costly trade-off studies have been replaced by optimization procedures. Rather than taking engineers and designers out of the loop, optimization has, in fact, put them more in control by providing sophisticated search techniques. The ultimate decision whether to accept or reject an optimal feasible design still rests with the analyst. Feedback obtained from this decision process has been invaluable since it can be incorporated into the optimization procedure to make it more intelligent. On several occasions, optimization procedures have produced novel designs, such as the nonsymmetric placement of rotor case stiffener rings, not anticipated by engineering designers. In another case, a particularly difficult resonance contraint could not be satisfied using hand iterations for a compressor blade, when the STAEBL program was applied to the problem, a feasible solution was obtained in just two iterations.
Two-dimensional imaging of two types of radicals by the CW-EPR method

NASA Astrophysics Data System (ADS)

Czechowski, Tomasz; Krzyminiewski, Ryszard; Jurga, Jan; Chlewicki, Wojciech

2008-01-01

The CW-EPR method of image reconstruction is based on sample rotation in a magnetic field with a constant gradient (50 G/cm). In order to obtain a projection (radical density distribution) along a given direction, the EPR spectra are recorded with and without the gradient. Deconvolution, then gives the distribution of the spin density. Projection at 36 different angles gives the information that is necessary for reconstruction of the radical distribution. The problem becomes more complex when there are at least two types of radicals in the sample, because the deconvolution procedure does not give satisfactory results. We propose a method to calculate the projections for each radical, based on iterative procedures. The images of density distribution for each radical obtained by our procedure have proved that the method of deconvolution, in combination with iterative fitting, provides correct results. The test was performed on a sample of polymer PPS Br 111 ( p-phenylene sulphide) with glass fibres and minerals. The results indicated a heterogeneous distribution of radicals in the sample volume. The images obtained were in agreement with the known shape of the sample.
Item Selection for the Development of Parallel Forms from an IRT-Based Seed Test Using a Sampling and Classification Approach

ERIC Educational Resources Information Center

Chen, Pei-Hua; Chang, Hua-Hua; Wu, Haiyan

2012-01-01

Two sampling-and-classification-based procedures were developed for automated test assembly: the Cell Only and the Cell and Cube methods. A simulation study based on a 540-item bank was conducted to compare the performance of the procedures with the performance of a mixed-integer programming (MIP) method for assembling multiple parallel test…
A semi-direct procedure using a local relaxation factor and its application to an internal flow problem

NASA Technical Reports Server (NTRS)

Chang, S. C.

1984-01-01

Generally, fast direct solvers are not directly applicable to a nonseparable elliptic partial differential equation. This limitation, however, is circumvented by a semi-direct procedure, i.e., an iterative procedure using fast direct solvers. An efficient semi-direct procedure which is easy to implement and applicable to a variety of boundary conditions is presented. The current procedure also possesses other highly desirable properties, i.e.: (1) the convergence rate does not decrease with an increase of grid cell aspect ratio, and (2) the convergence rate is estimated using the coefficients of the partial differential equation being solved.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

NASA Astrophysics Data System (ADS)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.
Parallel ptychographic reconstruction

DOE PAGES

Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...

2014-12-19

Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less
Gyrokinetic theory of turbulent acceleration and momentum conservation in tokamak plasmas

NASA Astrophysics Data System (ADS)

Lu, WANG; Shuitao, PENG; P, H. DIAMOND

2018-07-01

Understanding the generation of intrinsic rotation in tokamak plasmas is crucial for future fusion reactors such as ITER. We proposed a new mechanism named turbulent acceleration for the origin of the intrinsic parallel rotation based on gyrokinetic theory. The turbulent acceleration acts as a local source or sink of parallel rotation, i.e., volume force, which is different from the divergence of residual stress, i.e., surface force. However, the order of magnitude of turbulent acceleration can be comparable to that of the divergence of residual stress for electrostatic ion temperature gradient (ITG) turbulence. A possible theoretical explanation for the experimental observation of electron cyclotron heating induced decrease of co-current rotation was also proposed via comparison between the turbulent acceleration driven by ITG turbulence and that driven by collisionless trapped electron mode turbulence. We also extended this theory to electromagnetic ITG turbulence and investigated the electromagnetic effects on intrinsic parallel rotation drive. Finally, we demonstrated that the presence of turbulent acceleration does not conflict with momentum conservation.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

PubMed

Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

2015-07-05

We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Design and performance evaluation of a 20-aperture multipinhole collimator for myocardial perfusion imaging applications.

PubMed

Bowen, Jason D; Huang, Qiu; Ellin, Justin R; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2013-10-21

Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.
Design and performance evaluation of a 20-aperture multipinhole collimator for myocardial perfusion imaging applications

NASA Astrophysics Data System (ADS)

Bowen, Jason D.; Huang, Qiu; Ellin, Justin R.; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2013-10-01

Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.
Digital tomosynthesis mammography using a parallel maximum-likelihood reconstruction method

NASA Astrophysics Data System (ADS)

Wu, Tao; Zhang, Juemin; Moore, Richard; Rafferty, Elizabeth; Kopans, Daniel; Meleis, Waleed; Kaeli, David

2004-05-01

A parallel reconstruction method, based on an iterative maximum likelihood (ML) algorithm, is developed to provide fast reconstruction for digital tomosynthesis mammography. Tomosynthesis mammography acquires 11 low-dose projections of a breast by moving an x-ray tube over a 50° angular range. In parallel reconstruction, each projection is divided into multiple segments along the chest-to-nipple direction. Using the 11 projections, segments located at the same distance from the chest wall are combined to compute a partial reconstruction of the total breast volume. The shape of the partial reconstruction forms a thin slab, angled toward the x-ray source at a projection angle 0°. The reconstruction of the total breast volume is obtained by merging the partial reconstructions. The overlap region between neighboring partial reconstructions and neighboring projection segments is utilized to compensate for the incomplete data at the boundary locations present in the partial reconstructions. A serial execution of the reconstruction is compared to a parallel implementation, using clinical data. The serial code was run on a PC with a single PentiumIV 2.2GHz CPU. The parallel implementation was developed using MPI and run on a 64-node Linux cluster using 800MHz Itanium CPUs. The serial reconstruction for a medium-sized breast (5cm thickness, 11cm chest-to-nipple distance) takes 115 minutes, while a parallel implementation takes only 3.5 minutes. The reconstruction time for a larger breast using a serial implementation takes 187 minutes, while a parallel implementation takes 6.5 minutes. No significant differences were observed between the reconstructions produced by the serial and parallel implementations.

Computer method for identification of boiler transfer functions

NASA Technical Reports Server (NTRS)

Miles, J. H.

1972-01-01

Iterative computer aided procedure was developed which provides for identification of boiler transfer functions using frequency response data. Method uses frequency response data to obtain satisfactory transfer function for both high and low vapor exit quality data.
The Simplified Aircraft-Based Paired Approach With the ALAS Alerting Algorithm

NASA Technical Reports Server (NTRS)

Perry, Raleigh B.; Madden, Michael M.; Torres-Pomales, Wilfredo; Butler, Ricky W.

2013-01-01

This paper presents the results of an investigation of a proposed concept for closely spaced parallel runways called the Simplified Aircraft-based Paired Approach (SAPA). This procedure depends upon a new alerting algorithm called the Adjacent Landing Alerting System (ALAS). This study used both low fidelity and high fidelity simulations to validate the SAPA procedure and test the performance of the new alerting algorithm. The low fidelity simulation enabled a determination of minimum approach distance for the worst case over millions of scenarios. The high fidelity simulation enabled an accurate determination of timings and minimum approach distance in the presence of realistic trajectories, communication latencies, and total system error for 108 test cases. The SAPA procedure and the ALAS alerting algorithm were applied to the 750-ft parallel spacing (e.g., SFO 28L/28R) approach problem. With the SAPA procedure as defined in this paper, this study concludes that a 750-ft application does not appear to be feasible, but preliminary results for 1000-ft parallel runways look promising.
Modeling regional freight flow assignment through intermodal terminals

DOT National Transportation Integrated Search

2005-03-01

An analytical model is developed to assign regional freight across a multimodal highway and railway network using geographic information systems. As part of the regional planning process, the model is an iterative procedure that assigns multimodal fr...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Parrish, Robert M.; Liu, Fang; Martínez, Todd J., E-mail: toddjmartinez@gmail.com

We formulate self-consistent field (SCF) theory in terms of an interaction picture where the working variable is the difference density matrix between the true system and a corresponding superposition of atomic densities. As the difference density matrix directly represents the electronic deformations inherent in chemical bonding, this “difference self-consistent field (dSCF)” picture provides a number of significant conceptual and computational advantages. We show that this allows for a stable and efficient dSCF iterative procedure with wholly single-precision Coulomb and exchange matrix builds. We also show that the dSCF iterative procedure can be performed with aggressive screening of the pair space.more » These approximations are tested and found to be accurate for systems with up to 1860 atoms and >10 000 basis functions, providing for immediate overall speedups of up to 70% in the heavily optimized TERACHEM SCF implementation.« less
Eigenproblem solution by a combined Sturm sequence and inverse iteration technique.

NASA Technical Reports Server (NTRS)

Gupta, K. K.

1973-01-01

Description of an efficient and numerically stable algorithm, along with a complete listing of the associated computer program, developed for the accurate computation of specified roots and associated vectors of the eigenvalue problem Aq = lambda Bq with band symmetric A and B, B being also positive-definite. The desired roots are first isolated by the Sturm sequence procedure; then a special variant of the inverse iteration technique is applied for the individual determination of each root along with its vector. The algorithm fully exploits the banded form of relevant matrices, and the associated program written in FORTRAN V for the JPL UNIVAC 1108 computer proves to be most significantly economical in comparison to similar existing procedures. The program may be conveniently utilized for the efficient solution of practical engineering problems, involving free vibration and buckling analysis of structures. Results of such analyses are presented for representative structures.
Systems and methods for predicting materials properties

DOEpatents

Ceder, Gerbrand; Fischer, Chris; Tibbetts, Kevin; Morgan, Dane; Curtarolo, Stefano

2007-11-06

Systems and methods for predicting features of materials of interest. Reference data are analyzed to deduce relationships between the input data sets and output data sets. Reference data includes measured values and/or computed values. The deduced relationships can be specified as equations, correspondences, and/or algorithmic processes that produce appropriate output data when suitable input data is used. In some instances, the output data set is a subset of the input data set, and computational results may be refined by optionally iterating the computational procedure. To deduce features of a new material of interest, a computed or measured input property of the material is provided to an equation, correspondence, or algorithmic procedure previously deduced, and an output is obtained. In some instances, the output is iteratively refined. In some instances, new features deduced for the material of interest are added to a database of input and output data for known materials.
[Tissular expansion in giant congenital nevi treatment].

PubMed

Nguyen Van Nuoi, V; Francois-Fiquet, C; Diner, P; Sergent, B; Zazurca, F; Franchi, G; Buis, J; Vazquez, M-P; Picard, A; Kadlub, N

2014-08-01

Surgical management of giant melanotic naevi remains a surgical challenge. Tissue expansion provides tissue of the same quality for the repair of defects. The aim of this study is to review tissular expansion for giant melanotic naevi. We conducted a retrospective study from 2000 to 2012. All children patients who underwent a tissular expansion for giant congenital naevi had been included. Epidemiological data, surgical procedure, complication rate and results had been analysed. Thirty-tree patients had been included; they underwent 61 procedures with 79 tissular-expansion prosthesis. Previous surgery, mostly simple excision had been performed before tissular expansion. Complete naevus excision had been performed in 63.3% of the cases. Complications occurred in 45% of the cases, however in 50% of them were minor. Iterative surgery increased the complication rate. Tissular expansion is a valuable option for giant congenital naevus. However, complication rate remained high, especially when iterative surgery is needed. Copyright © 2013 Elsevier Masson SAS. All rights reserved.
Efficient fractal-based mutation in evolutionary algorithms from iterated function systems

NASA Astrophysics Data System (ADS)

Salcedo-Sanz, S.; Aybar-Ruíz, A.; Camacho-Gómez, C.; Pereira, E.

2018-03-01

In this paper we present a new mutation procedure for Evolutionary Programming (EP) approaches, based on Iterated Function Systems (IFSs). The new mutation procedure proposed consists of considering a set of IFS which are able to generate fractal structures in a two-dimensional phase space, and use them to modify a current individual of the EP algorithm, instead of using random numbers from different probability density functions. We test this new proposal in a set of benchmark functions for continuous optimization problems. In this case, we compare the proposed mutation against classical Evolutionary Programming approaches, with mutations based on Gaussian, Cauchy and chaotic maps. We also include a discussion on the IFS-based mutation in a real application of Tuned Mass Dumper (TMD) location and optimization for vibration cancellation in buildings. In both practical cases, the proposed EP with the IFS-based mutation obtained extremely competitive results compared to alternative classical mutation operators.
Communication: A difference density picture for the self-consistent field ansatz.

PubMed

Parrish, Robert M; Liu, Fang; Martínez, Todd J

2016-04-07

We formulate self-consistent field (SCF) theory in terms of an interaction picture where the working variable is the difference density matrix between the true system and a corresponding superposition of atomic densities. As the difference density matrix directly represents the electronic deformations inherent in chemical bonding, this "difference self-consistent field (dSCF)" picture provides a number of significant conceptual and computational advantages. We show that this allows for a stable and efficient dSCF iterative procedure with wholly single-precision Coulomb and exchange matrix builds. We also show that the dSCF iterative procedure can be performed with aggressive screening of the pair space. These approximations are tested and found to be accurate for systems with up to 1860 atoms and >10 000 basis functions, providing for immediate overall speedups of up to 70% in the heavily optimized TeraChem SCF implementation.
Communication: A difference density picture for the self-consistent field ansatz

NASA Astrophysics Data System (ADS)

Parrish, Robert M.; Liu, Fang; Martínez, Todd J.

2016-04-01

We formulate self-consistent field (SCF) theory in terms of an interaction picture where the working variable is the difference density matrix between the true system and a corresponding superposition of atomic densities. As the difference density matrix directly represents the electronic deformations inherent in chemical bonding, this "difference self-consistent field (dSCF)" picture provides a number of significant conceptual and computational advantages. We show that this allows for a stable and efficient dSCF iterative procedure with wholly single-precision Coulomb and exchange matrix builds. We also show that the dSCF iterative procedure can be performed with aggressive screening of the pair space. These approximations are tested and found to be accurate for systems with up to 1860 atoms and >10 000 basis functions, providing for immediate overall speedups of up to 70% in the heavily optimized TeraChem SCF implementation.
CAD-Based Shielding Analysis for ITER Port Diagnostics

NASA Astrophysics Data System (ADS)

Serikov, Arkady; Fischer, Ulrich; Anthoine, David; Bertalot, Luciano; De Bock, Maartin; O'Connor, Richard; Juarez, Rafael; Krasilnikov, Vitaly

2017-09-01

Radiation shielding analysis conducted in support of design development of the contemporary diagnostic systems integrated inside the ITER ports is relied on the use of CAD models. This paper presents the CAD-based MCNP Monte Carlo radiation transport and activation analyses for the Diagnostic Upper and Equatorial Port Plugs (UPP #3 and EPP #8, #17). The creation process of the complicated 3D MCNP models of the diagnostics systems was substantially accelerated by application of the CAD-to-MCNP converter programs MCAM and McCad. High performance computing resources of the Helios supercomputer allowed to speed-up the MCNP parallel transport calculations with the MPI/OpenMP interface. The found shielding solutions could be universal, reducing ports R&D costs. The shield block behind the Tritium and Deposit Monitor (TDM) optical box was added to study its influence on Shut-Down Dose Rate (SDDR) in Port Interspace (PI) of EPP#17. Influence of neutron streaming along the Lost Alpha Monitor (LAM) on the neutron energy spectra calculated in the Tangential Neutron Spectrometer (TNS) of EPP#8. For the UPP#3 with Charge eXchange Recombination Spectroscopy (CXRS-core), an excessive neutron streaming along the CXRS shutter, which should be prevented in further design iteration.
Development of a real-time system for ITER first wall heat load control

NASA Astrophysics Data System (ADS)

Anand, Himank; de Vries, Peter; Gribov, Yuri; Pitts, Richard; Snipes, Joseph; Zabeo, Luca

2017-10-01

The steady state heat flux on the ITER first wall (FW) panels are limited by the heat removal capacity of the water cooling system. In case of off-normal events (e.g. plasma displacement during H-L transitions), the heat loads are predicted to exceed the design limits (2-4.7 MW/m2). Intense heat loads are predicted on the FW, even well before the burning plasma phase. Thus, a real-time (RT) FW heat load control system is mandatory from early plasma operation of the ITER tokamak. A heat load estimator based on the RT equilibrium reconstruction has been developed for the plasma control system (PCS). A scheme, estimating the energy state for prescribed gaps defined as the distance between the last closed flux surface (LCFS)/separatrix and the FW is presented. The RT energy state is determined by the product of a weighted function of gap distance and the power crossing the plasma boundary. In addition, a heat load estimator assuming a simplified FW geometry and parallel heat transport model in the scrape-off layer (SOL), benchmarked against a full 3-D magnetic field line tracer is also presented.
A finite element solver for 3-D compressible viscous flows

NASA Technical Reports Server (NTRS)

Reddy, K. C.; Reddy, J. N.; Nayani, S.

1990-01-01

Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.
The ITER ICRF Antenna Design with TOPICA

NASA Astrophysics Data System (ADS)

Milanesio, Daniele; Maggiora, Riccardo; Meneghini, Orso; Vecchi, Giuseppe

2007-11-01

TOPICA (Torino Polytechnic Ion Cyclotron Antenna) code is an innovative tool for the 3D/1D simulation of Ion Cyclotron Radio Frequency (ICRF), i.e. accounting for antennas in a realistic 3D geometry and with an accurate 1D plasma model [1]. The TOPICA code has been deeply parallelized and has been already proved to be a reliable tool for antennas design and performance prediction. A detailed analysis of the 24 straps ITER ICRF antenna geometry has been carried out, underlining the strong dependence and asymmetries of the antenna input parameters due to the ITER plasma response. We optimized the antenna array geometry dimensions to maximize loading, lower mutual couplings and mitigate sheath effects. The calculated antenna input impedance matrices are TOPICA results of a paramount importance for the tuning and matching system design. Electric field distributions have been also calculated and they are used as the main input for the power flux estimation tool. The designed optimized antenna is capable of coupling 20 MW of power to plasma in the 40 -- 55 MHz frequency range with a maximum voltage of 45 kV in the feeding coaxial cables. [1] V. Lancellotti et al., Nuclear Fusion, 46 (2006) S476-S499
Iterative-Transform Phase Retrieval Using Adaptive Diversity

NASA Technical Reports Server (NTRS)

Dean, Bruce H.

2007-01-01

A phase-diverse iterative-transform phase-retrieval algorithm enables high spatial-frequency, high-dynamic-range, image-based wavefront sensing. [The terms phase-diverse, phase retrieval, image-based, and wavefront sensing are defined in the first of the two immediately preceding articles, Broadband Phase Retrieval for Image-Based Wavefront Sensing (GSC-14899-1).] As described below, no prior phase-retrieval algorithm has offered both high dynamic range and the capability to recover high spatial-frequency components. Each of the previously developed image-based phase-retrieval techniques can be classified into one of two categories: iterative transform or parametric. Among the modifications of the original iterative-transform approach has been the introduction of a defocus diversity function (also defined in the cited companion article). Modifications of the original parametric approach have included minimizing alternative objective functions as well as implementing a variety of nonlinear optimization methods. The iterative-transform approach offers the advantage of ability to recover low, middle, and high spatial frequencies, but has disadvantage of having a limited dynamic range to one wavelength or less. In contrast, parametric phase retrieval offers the advantage of high dynamic range, but is poorly suited for recovering higher spatial frequency aberrations. The present phase-diverse iterative transform phase-retrieval algorithm offers both the high-spatial-frequency capability of the iterative-transform approach and the high dynamic range of parametric phase-recovery techniques. In implementation, this is a focus-diverse iterative-transform phaseretrieval algorithm that incorporates an adaptive diversity function, which makes it possible to avoid phase unwrapping while preserving high-spatial-frequency recovery. The algorithm includes an inner and an outer loop (see figure). An initial estimate of phase is used to start the algorithm on the inner loop, wherein multiple intensity images are processed, each using a different defocus value. The processing is done by an iterative-transform method, yielding individual phase estimates corresponding to each image of the defocus-diversity data set. These individual phase estimates are combined in a weighted average to form a new phase estimate, which serves as the initial phase estimate for either the next iteration of the iterative-transform method or, if the maximum number of iterations has been reached, for the next several steps, which constitute the outerloop portion of the algorithm. The details of the next several steps must be omitted here for the sake of brevity. The overall effect of these steps is to adaptively update the diversity defocus values according to recovery of global defocus in the phase estimate. Aberration recovery varies with differing amounts as the amount of diversity defocus is updated in each image; thus, feedback is incorporated into the recovery process. This process is iterated until the global defocus error is driven to zero during the recovery process. The amplitude of aberration may far exceed one wavelength after completion of the inner-loop portion of the algorithm, and the classical iterative transform method does not, by itself, enable recovery of multi-wavelength aberrations. Hence, in the absence of a means of off-loading the multi-wavelength portion of the aberration, the algorithm would produce a wrapped phase map. However, a special aberration-fitting procedure can be applied to the wrapped phase data to transfer at least some portion of the multi-wavelength aberration to the diversity function, wherein the data are treated as known phase values. In this way, a multiwavelength aberration can be recovered incrementally by successively applying the aberration-fitting procedure to intermediate wrapped phase maps. During recovery, as more of the aberration is transferred to the diversity function following successive iterations around the ter loop, the estimated phase ceases to wrap in places where the aberration values become incorporated as part of the diversity function. As a result, as the aberration content is transferred to the diversity function, the phase estimate resembles that of a reference flat.
Optical systolic array processor using residue arithmetic

NASA Technical Reports Server (NTRS)

Jackson, J.; Casasent, D.

1983-01-01

The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
Power deposition on misaligned castellated tungsten blocks in the Magnum-PSI and Pilot-PSI linear devices

NASA Astrophysics Data System (ADS)

Morgan, T. W.; van den Berg, M. A.; De Temmerman, G.; Bardin, S.; Aussems, D. U. B.; Pitts, R. A.

2017-12-01

For the final design of the ITER divertor it is important to determine whether shaping of each tungsten monoblock to eliminate leading edges is required or not. In order to aid this decision, two experiments were performed in DIFFER’s linear plasma devices to study heat loads on misaligned water cooled blocks at glancing incidence. First, a series of tungsten blocks were exposed to a high parallel heat flux (26 MW \
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1985-01-01

Synopses are given for NASA supported work in computer science at the University of Virginia. Some areas of research include: error seeding as a testing method; knowledge representation for engineering design; analysis of faults in a multi-version software experiment; implementation of a parallel programming environment; two computer graphics systems for visualization of pressure distribution and convective density particles; task decomposition for multiple robot arms; vectorized incomplete conjugate gradient; and iterative methods for solving linear equations on the Flex/32.
Exploiting Data Sparsity in Parallel Matrix Powers Computations

DTIC Science & Technology

2013-05-03

2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, Edmond

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.

Calculation of Moment Matrix Elements for Bilinear Quadrilaterals and Higher-Order Basis Functions

DTIC Science & Technology

2016-01-06

methods are known as boundary integral equation (BIE) methods and the present study falls into this category. The numerical solution of the BIE is...iterated integrals. The inner integral involves the product of the free-space Green’s function for the Helmholtz equation multiplied by an appropriate...Website: http://www.wipl-d.com/ 5. Y. Zhang and T. K. Sarkar, Parallel Solution of Integral Equation -Based EM Problems in the Frequency Domain. New
Nonlinear PP and PS joint inversion based on the exact Zoeppritz equations: a two-stage procedure

NASA Astrophysics Data System (ADS)

Zhi, Lixia; Chen, Shuangquan; Song, Baoshan; Li, Xiang-yang

2018-04-01

S-velocity and density are very important parameters in distinguishing lithology and estimating other petrophysical properties. A reliable estimate of S-velocity and density is very difficult to obtain, even from long-offset gather data. Joint inversion of PP and PS data provides a promising strategy for stabilizing and improving the results of inversion in estimating elastic parameters and density. For 2D or 3D inversion, the trace-by-trace strategy is still the most widely used method although it often suffers from a lack of clarity because of its high efficiency, which is due to parallel computing. This paper describes a two-stage inversion method for nonlinear PP and PS joint inversion based on the exact Zoeppritz equations. There are several advantages for our proposed methods as follows: (1) Thanks to the exact Zoeppritz equation, our joint inversion method is applicable for wide angle amplitude-versus-angle inversion; (2) The use of both P- and S-wave information can further enhance the stability and accuracy of parameter estimation, especially for the S-velocity and density; (3) The two-stage inversion procedure proposed in this paper can achieve a good compromise between efficiency and precision. On the one hand, the trace-by-trace strategy used in the first stage can be processed in parallel so that it has high computational efficiency. On the other hand, to deal with the indistinctness of and undesired disturbances to the inversion results obtained from the first stage, we apply the second stage—total variation (TV) regularization. By enforcing spatial and temporal constraints, the TV regularization stage deblurs the inversion results and leads to parameter estimation with greater precision. Notably, the computation consumption of the TV regularization stage can be ignored compared to the first stage because it is solved using the fast split Bregman iterations. Numerical examples using a well log and the Marmousi II model show that the proposed joint inversion is a reliable method capable of accurately estimating the density parameter as well as P-wave velocity and S-wave velocity, even when the seismic data is noisy with signal-to-noise ratio of 5.
Bin-Hash Indexing: A Parallel Method for Fast Query Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

2008-06-27

This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less
An Efficient Algorithm for Perturbed Orbit Integration Combining Analytical Continuation and Modified Chebyshev Picard Iteration

NASA Astrophysics Data System (ADS)

Elgohary, T.; Kim, D.; Turner, J.; Junkins, J.

2014-09-01

Several methods exist for integrating the motion in high order gravity fields. Some recent methods use an approximate starting orbit, and an efficient method is needed for generating warm starts that account for specific low order gravity approximations. By introducing two scalar Lagrange-like invariants and employing Leibniz product rule, the perturbed motion is integrated by a novel recursive formulation. The Lagrange-like invariants allow exact arbitrary order time derivatives. Restricting attention to the perturbations due to the zonal harmonics J2 through J6, we illustrate an idea. The recursively generated vector-valued time derivatives for the trajectory are used to develop a continuation series-based solution for propagating position and velocity. Numerical comparisons indicate performance improvements of ~ 70X over existing explicit Runge-Kutta methods while maintaining mm accuracy for the orbit predictions. The Modified Chebyshev Picard Iteration (MCPI) is an iterative path approximation method to solve nonlinear ordinary differential equations. The MCPI utilizes Picard iteration with orthogonal Chebyshev polynomial basis functions to recursively update the states. The key advantages of the MCPI are as follows: 1) Large segments of a trajectory can be approximated by evaluating the forcing function at multiple nodes along the current approximation during each iteration. 2) It can readily handle general gravity perturbations as well as non-conservative forces. 3) Parallel applications are possible. The Picard sequence converges to the solution over large time intervals when the forces are continuous and differentiable. According to the accuracy of the starting solutions, however, the MCPI may require significant number of iterations and function evaluations compared to other integrators. In this work, we provide an efficient methodology to establish good starting solutions from the continuation series method; this warm start improves the performance of the MCPI significantly and will likely be useful for other applications where efficiently computed approximate orbit solutions are needed.
Development of a mirror-based endoscope for divertor spectroscopy on JET with the new ITER-like wall (invited).

PubMed

Huber, A; Brezinsek, S; Mertens, Ph; Schweer, B; Sergienko, G; Terra, A; Arnoux, G; Balshaw, N; Clever, M; Edlingdon, T; Egner, S; Farthing, J; Hartl, M; Horton, L; Kampf, D; Klammer, J; Lambertz, H T; Matthews, G F; Morlock, C; Murari, A; Reindl, M; Riccardo, V; Samm, U; Sanders, S; Stamp, M; Williams, J; Zastrow, K D; Zauner, C

2012-10-01

A new endoscope with optimised divertor view has been developed in order to survey and monitor the emission of specific impurities such as tungsten and the remaining carbon as well as beryllium in the tungsten divertor of JET after the implementation of the ITER-like wall in 2011. The endoscope is a prototype for testing an ITER relevant design concept based on reflective optics only. It may be subject to high neutron fluxes as expected in ITER. The operating wavelength range, from 390 nm to 2500 nm, allows the measurements of the emission of all expected impurities (W I, Be II, C I, C II, C III) with high optical transmittance (≥ 30% in the designed wavelength range) as well as high spatial resolution that is ≤ 2 mm at the object plane and ≤ 3 mm for the full depth of field (± 0.7 m). The new optical design includes options for in situ calibration of the endoscope transmittance during the experimental campaign, which allows the continuous tracing of possible transmittance degradation with time due to impurity deposition and erosion by fast neutral particles. In parallel to the new optical design, a new type of possibly ITER relevant shutter system based on pneumatic techniques has been developed and integrated into the endoscope head. The endoscope is equipped with four digital CCD cameras, each combined with two filter wheels for narrow band interference and neutral density filters. Additionally, two protection cameras in the λ > 0.95 μm range have been integrated in the optical design for the real time wall protection during the plasma operation of JET.
Low pressure and high power rf sources for negative hydrogen ions for fusion applications (ITER neutral beam injection).

PubMed

Fantz, U; Franzen, P; Kraus, W; Falter, H D; Berger, M; Christ-Koch, S; Fröschle, M; Gutser, R; Heinemann, B; Martens, C; McNeely, P; Riedl, R; Speth, E; Wünderlich, D

2008-02-01

The international fusion experiment ITER requires for the plasma heating and current drive a neutral beam injection system based on negative hydrogen ion sources at 0.3 Pa. The ion source must deliver a current of 40 A D(-) for up to 1 h with an accelerated current density of 200 Am/(2) and a ratio of coextracted electrons to ions below 1. The extraction area is 0.2 m(2) from an aperture array with an envelope of 1.5 x 0.6 m(2). A high power rf-driven negative ion source has been successfully developed at the Max-Planck Institute for Plasma Physics (IPP) at three test facilities in parallel. Current densities of 330 and 230 Am/(2) have been achieved for hydrogen and deuterium, respectively, at a pressure of 0.3 Pa and an electron/ion ratio below 1 for a small extraction area (0.007 m(2)) and short pulses (<4 s). In the long pulse experiment, equipped with an extraction area of 0.02 m(2), the pulse length has been extended to 3600 s. A large rf source, with the width and half the height of the ITER source but without extraction system, is intended to demonstrate the size scaling and plasma homogeneity of rf ion sources. The source operates routinely now. First results on plasma homogeneity obtained from optical emission spectroscopy and Langmuir probes are very promising. Based on the success of the IPP development program, the high power rf-driven negative ion source has been chosen recently for the ITER beam systems in the ITER design review process.
Advanced density profile reflectometry; the state-of-the-art and measurement prospects for ITER

NASA Astrophysics Data System (ADS)

Doyle, E. J.

2006-10-01

Dramatic progress in millimeter-wave technology has allowed the realization of a key goal for ITER diagnostics, the routine measurement of the plasma density profile from millimeter-wave radar (reflectometry) measurements. In reflectometry, the measured round-trip group delay of a probe beam reflected from a plasma cutoff is used to infer the density distribution in the plasma. Reflectometer systems implemented by UCLA on a number of devices employ frequency-modulated continuous-wave (FM-CW), ultrawide-bandwidth, high-resolution radar systems. One such system on DIII-D has routinely demonstrated measurements of the density profile over a range of electron density of 0-6.4x10^19,m-3, with ˜25 μs time and ˜4 mm radial resolution, meeting key ITER requirements. This progress in performance was made possible by multiple advances in the areas of millimeter-wave technology, novel measurement techniques, and improved understanding, including: (i) fast sweep, solid-state, wide bandwidth sources and power amplifiers, (ii) dual polarization measurements to expand the density range, (iii) adaptive radar-based data analysis with parallel processing on a Unix cluster, (iv) high memory depth data acquisition, and (v) advances in full wave code modeling. The benefits of advanced system performance will be illustrated using measurements from a wide range of phenomena, including ELM and fast-ion driven mode dynamics, L-H transition studies and plasma-wall interaction. The measurement capabilities demonstrated by these systems provide a design basis for the development of the main ITER profile reflectometer system. This talk will explore the extent to which these reflectometer system designs, results and experience can be translated to ITER, and will identify what new studies and experimental tests are essential.
Single-step reinitialization and extending algorithms for level-set based multi-phase flow simulations

NASA Astrophysics Data System (ADS)

Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.

2017-12-01

We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.
Development of a mirror-based endoscope for divertor spectroscopy on JET with the new ITER-like wall (invited)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, A.; Brezinsek, S.; Mertens, Ph.

2012-10-15

A new endoscope with optimised divertor view has been developed in order to survey and monitor the emission of specific impurities such as tungsten and the remaining carbon as well as beryllium in the tungsten divertor of JET after the implementation of the ITER-like wall in 2011. The endoscope is a prototype for testing an ITER relevant design concept based on reflective optics only. It may be subject to high neutron fluxes as expected in ITER. The operating wavelength range, from 390 nm to 2500 nm, allows the measurements of the emission of all expected impurities (W I, Be II,more » C I, C II, C III) with high optical transmittance ({>=}30% in the designed wavelength range) as well as high spatial resolution that is {<=}2 mm at the object plane and {<=}3 mm for the full depth of field ({+-}0.7 m). The new optical design includes options for in situ calibration of the endoscope transmittance during the experimental campaign, which allows the continuous tracing of possible transmittance degradation with time due to impurity deposition and erosion by fast neutral particles. In parallel to the new optical design, a new type of possibly ITER relevant shutter system based on pneumatic techniques has been developed and integrated into the endoscope head. The endoscope is equipped with four digital CCD cameras, each combined with two filter wheels for narrow band interference and neutral density filters. Additionally, two protection cameras in the {lambda} > 0.95 {mu}m range have been integrated in the optical design for the real time wall protection during the plasma operation of JET.« less
Error Control Coding Techniques for Space and Satellite Communications

NASA Technical Reports Server (NTRS)

Costello, Daniel J., Jr.; Takeshita, Oscar Y.; Cabral, Hermano A.

1998-01-01

It is well known that the BER performance of a parallel concatenated turbo-code improves roughly as 1/N, where N is the information block length. However, it has been observed by Benedetto and Montorsi that for most parallel concatenated turbo-codes, the FER performance does not improve monotonically with N. In this report, we study the FER of turbo-codes, and the effects of their concatenation with an outer code. Two methods of concatenation are investigated: across several frames and within each frame. Some asymmetric codes are shown to have excellent FER performance with an information block length of 16384. We also show that the proposed outer coding schemes can improve the BER performance as well by eliminating pathological frames generated by the iterative MAP decoding process.
Distributed Optimal Power Flow of AC/DC Interconnected Power Grid Using Synchronous ADMM

NASA Astrophysics Data System (ADS)

Liang, Zijun; Lin, Shunjiang; Liu, Mingbo

2017-05-01

Distributed optimal power flow (OPF) is of great importance and challenge to AC/DC interconnected power grid with different dispatching centres, considering the security and privacy of information transmission. In this paper, a fully distributed algorithm for OPF problem of AC/DC interconnected power grid called synchronous ADMM is proposed, and it requires no form of central controller. The algorithm is based on the fundamental alternating direction multiplier method (ADMM), by using the average value of boundary variables of adjacent regions obtained from current iteration as the reference values of both regions for next iteration, which realizes the parallel computation among different regions. The algorithm is tested with the IEEE 11-bus AC/DC interconnected power grid, and by comparing the results with centralized algorithm, we find it nearly no differences, and its correctness and effectiveness can be validated.
Present limits and improvements of structural materials for fusion reactors - a review

NASA Astrophysics Data System (ADS)

Tavassoli, A.-A. F.

2002-04-01

Since the transition from ITER or DEMO to a commercial power reactor would involve a significant change in system and materials options, a parallel R&D path has been put in place in Europe to address these issues. This paper assesses the structural materials part of this program along with the latest R&D results from the main programs. It is shown that stainless steels and ferritic/martensitic steels, retained for ITER and DEMO, will also remain the principal contenders for the future FPR, despite uncertainties over irradiation induced embrittlement at low temperatures and consequences of high He/dpa ratio. Neither one of the present advanced high temperature materials has to this date the structural integrity reliability needed for application in critical components. This situation is unlikely to change with the materials R&D alone and has to be mitigated in close collaboration with blanket system design.
Beam ion acceleration by ICRH in JET discharges

NASA Astrophysics Data System (ADS)

Budny, R. V.; Gorelenkova, M.; Bertelli, N.; JET Collaboration

2015-11-01

The ion Monte-Carlo orbit integrator NUBEAM, used in TRANSP has been enhanced to include an ``RF-kick'' operator to simulate the interaction of RF fields and fast ions. The RF quasi-linear operator (localized in space) uses a second R-Z orbit integrator. We apply this to analysis of recent JET discharges using ICRH with the ITER-like first wall. An example of results for a high performance Hybrid discharge for which standard TRANSP analysis simulated the DD neutron emission rate below measurements, re-analysis using the RF-kick operator results in increased beam parallel and perpendicular energy densities (~=40% and 15% respectively), and increased beam-thermal neutron emission (~= 35%), making the total rate closer to the measurement. Checks of the numerics, comparisons with measurements, and ITER implications will be presented. Supported in part by the US DoE contract DE-AC02-09CH11466 and by EUROfusion No 633053.
Dynamic phasing of multichannel cw laser radiation by means of a stochastic gradient algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Volkov, V A; Volkov, M V; Garanin, S G

2013-09-30

The phasing of a multichannel laser beam by means of an iterative stochastic parallel gradient (SPG) algorithm has been numerically and experimentally investigated. The operation of the SPG algorithm is simulated, the acceptable range of amplitudes of probe phase shifts is found, and the algorithm parameters at which the desired Strehl number can be obtained with a minimum number of iterations are determined. An experimental bench with phase modulators based on lithium niobate, which are controlled by a multichannel electronic unit with a real-time microcontroller, has been designed. Phasing of 16 cw laser beams at a system response bandwidth ofmore » 3.7 kHz and phase thermal distortions in a frequency band of about 10 Hz is experimentally demonstrated. The experimental data are in complete agreement with the calculation results. (control of laser radiation parameters)« less
A high data rate universal lattice decoder on FPGA

NASA Astrophysics Data System (ADS)

Ma, Jing; Huang, Xinming; Kura, Swapna

2005-06-01

This paper presents the architecture design of a high data rate universal lattice decoder for MIMO channels on FPGA platform. A phost strategy based lattice decoding algorithm is modified in this paper to reduce the complexity of the closest lattice point search. The data dependency of the improved algorithm is examined and a parallel and pipeline architecture is developed with the iterative decoding function on FPGA and the division intensive channel matrix preprocessing on DSP. Simulation results demonstrate that the improved lattice decoding algorithm provides better bit error rate and less iteration number compared with the original algorithm. The system prototype of the decoder shows that it supports data rate up to 7Mbit/s on a Virtex2-1000 FPGA, which is about 8 times faster than the original algorithm on FPGA platform and two-orders of magnitude better than its implementation on a DSP platform.
Design of a tight frame of 2D shearlets based on a fast non-iterative analysis and synthesis algorithm

NASA Astrophysics Data System (ADS)

Goossens, Bart; Aelterman, Jan; Luong, Hi"p.; Pižurica, Aleksandra; Philips, Wilfried

2011-09-01

The shearlet transform is a recent sibling in the family of geometric image representations that provides a traditional multiresolution analysis combined with a multidirectional analysis. In this paper, we present a fast DFT-based analysis and synthesis scheme for the 2D discrete shearlet transform. Our scheme conforms to the continuous shearlet theory to high extent, provides perfect numerical reconstruction (up to floating point rounding errors) in a non-iterative scheme and is highly suitable for parallel implementation (e.g. FPGA, GPU). We show that our discrete shearlet representation is also a tight frame and the redundancy factor of the transform is around 2.6, independent of the number of analysis directions. Experimental denoising results indicate that the transform performs the same or even better than several related multiresolution transforms, while having a significantly lower redundancy factor.
A 3D finite-difference BiCG iterative solver with the Fourier-Jacobi preconditioner for the anisotropic EIT/EEG forward problem.

PubMed

Turovets, Sergei; Volkov, Vasily; Zherdetsky, Aleksej; Prakonina, Alena; Malony, Allen D

2014-01-01

The Electrical Impedance Tomography (EIT) and electroencephalography (EEG) forward problems in anisotropic inhomogeneous media like the human head belongs to the class of the three-dimensional boundary value problems for elliptic equations with mixed derivatives. We introduce and explore the performance of several new promising numerical techniques, which seem to be more suitable for solving these problems. The proposed numerical schemes combine the fictitious domain approach together with the finite-difference method and the optimally preconditioned Conjugate Gradient- (CG-) type iterative method for treatment of the discrete model. The numerical scheme includes the standard operations of summation and multiplication of sparse matrices and vector, as well as FFT, making it easy to implement and eligible for the effective parallel implementation. Some typical use cases for the EIT/EEG problems are considered demonstrating high efficiency of the proposed numerical technique.
Computer program for solving laminar, transitional, or turbulent compressible boundary-layer equations for two-dimensional and axisymmetric flow

NASA Technical Reports Server (NTRS)

Harris, J. E.; Blanchard, D. K.

1982-01-01

A numerical algorithm and computer program are presented for solving the laminar, transitional, or turbulent two dimensional or axisymmetric compressible boundary-layer equations for perfect-gas flows. The governing equations are solved by an iterative three-point implicit finite-difference procedure. The software, program VGBLP, is a modification of the approach presented in NASA TR R-368 and NASA TM X-2458, respectively. The major modifications are: (1) replacement of the fourth-order Runge-Kutta integration technique with a finite-difference procedure for numerically solving the equations required to initiate the parabolic marching procedure; (2) introduction of the Blottner variable-grid scheme; (3) implementation of an iteration scheme allowing the coupled system of equations to be converged to a specified accuracy level; and (4) inclusion of an iteration scheme for variable-entropy calculations. These modifications to the approach presented in NASA TR R-368 and NASA TM X-2458 yield a software package with high computational efficiency and flexibility. Turbulence-closure options include either two-layer eddy-viscosity or mixing-length models. Eddy conductivity is modeled as a function of eddy viscosity through a static turbulent Prandtl number formulation. Several options are provided for specifying the static turbulent Prandtl number. The transitional boundary layer is treated through a streamwise intermittency function which modifies the turbulence-closure model. This model is based on the probability distribution of turbulent spots and ranges from zero to unity for laminar and turbulent flow, respectively. Several test cases are presented as guides for potential users of the software.
Role of the Controller in an Integrated Pilot-Controller Study for Parallel Approaches

NASA Technical Reports Server (NTRS)

Verma, Savvy; Kozon, Thomas; Ballinger, Debbi; Lozito, Sandra; Subramanian, Shobana

2011-01-01

Closely spaced parallel runway operations have been found to increase capacity within the National Airspace System but poor visibility conditions reduce the use of these operations [1]. Previous research examined the concepts and procedures related to parallel runways [2][4][5]. However, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot s and controller s procedures and information requirements for creating aircraft pairs for closely spaced parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was 12 nmi from the runway threshold. In this paper, the role of the controller, as examined in an integrated study of controllers and pilots, is presented. The controllers utilized a pairing scheduler and new pairing interfaces to help create and maintain aircraft pairs, in a high-fidelity, human-in-the loop simulation experiment. Results show that the controllers worked as a team to achieve pairing between aircraft and the level of inter-controller coordination increased when the aircraft in the pair belonged to different sectors. Controller feedback did not reveal over reliance on the automation nor complacency with the pairing automation or pairing procedures.
Inferring the demographic history from DNA sequences: An importance sampling approach based on non-homogeneous processes.

PubMed

Ait Kaci Azzou, S; Larribe, F; Froda, S

2016-10-01

In Ait Kaci Azzou et al. (2015) we introduced an Importance Sampling (IS) approach for estimating the demographic history of a sample of DNA sequences, the skywis plot. More precisely, we proposed a new nonparametric estimate of a population size that changes over time. We showed on simulated data that the skywis plot can work well in typical situations where the effective population size does not undergo very steep changes. In this paper, we introduce an iterative procedure which extends the previous method and gives good estimates under such rapid variations. In the iterative calibrated skywis plot we approximate the effective population size by a piecewise constant function, whose values are re-estimated at each step. These piecewise constant functions are used to generate the waiting times of non homogeneous Poisson processes related to a coalescent process with mutation under a variable population size model. Moreover, the present IS procedure is based on a modified version of the Stephens and Donnelly (2000) proposal distribution. Finally, we apply the iterative calibrated skywis plot method to a simulated data set from a rapidly expanding exponential model, and we show that the method based on this new IS strategy correctly reconstructs the demographic history. Copyright © 2016. Published by Elsevier Inc.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.