parallel iterative solution: Topics by Science.gov

Sample records for parallel iterative solution

Single-agent parallel window search

NASA Technical Reports Server (NTRS)

Powley, Curt; Korf, Richard E.

1991-01-01

Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.
An iterative method for systems of nonlinear hyperbolic equations

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.

1989-01-01

An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic equations is presented. Parallelism is evident at several levels. In the formation of the iteration, the equations are decoupled, thereby providing large grain parallelism. Parallelism may also be exploited within the solves for each equation. Convergence of the interation is established via a bounding function argument. Experimental results in two-dimensions are presented.
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Vectorized and multitasked solution of the few-group neutron diffusion equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zee, S.K.; Turinsky, P.J.; Shayer, Z.

1989-03-01

A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
Solution of a tridiagonal system of equations on the finite element machine

NASA Technical Reports Server (NTRS)

Bostic, S. W.

1984-01-01

Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
Efficient Iterative Methods Applied to the Solution of Transonic Flows

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

1996-02-01

We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
Parallelized implicit propagators for the finite-difference Schrödinger equation

NASA Astrophysics Data System (ADS)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
Efficient iterative methods applied to the solution of transonic flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.

1996-02-01

We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi

PubMed Central

Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil

2012-01-01

The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

NASA Technical Reports Server (NTRS)

Padovan, Joseph; Krishna, Lala; Gute, Douglas

1997-01-01

Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Domain decomposition preconditioners for the spectral collocation method

NASA Technical Reports Server (NTRS)

Quarteroni, Alfio; Sacchilandriani, Giovanni

1988-01-01

Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Aerodynamic optimization by simultaneously updating flow variables and design parameters

NASA Technical Reports Server (NTRS)

Rizk, M. H.

1990-01-01

The application of conventional optimization schemes to aerodynamic design problems leads to inner-outer iterative procedures that are very costly. An alternative approach is presented based on the idea of updating the flow variable iterative solutions and the design parameter iterative solutions simultaneously. Two schemes based on this idea are applied to problems of correcting wind tunnel wall interference and optimizing advanced propeller designs. The first of these schemes is applicable to a limited class of two-design-parameter problems with an equality constraint. It requires the computation of a single flow solution. The second scheme is suitable for application to general aerodynamic problems. It requires the computation of several flow solutions in parallel. In both schemes, the design parameters are updated as the iterative flow solutions evolve. Computations are performed to test the schemes' efficiency, accuracy, and sensitivity to variations in the computational parameters.
New Bandwidth Efficient Parallel Concatenated Coding Schemes

NASA Technical Reports Server (NTRS)

Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

1996-01-01

We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.

Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi N.; Hixon, Duane

1991-01-01

Efficient iterative solution methods are being developed for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. Thus, the extra work required by iterative schemes can also be designed to perform efficiently on current and future generation scalable, missively parallel machines. An obvious candidate for iteratively solving the system of coupled nonlinear algebraic equations arising in CFD applications is the Newton method. Newton's method was implemented in existing finite difference and finite volume methods. Depending on the complexity of the problem, the number of Newton iterations needed per step to solve the discretized system of equations can, however, vary dramatically from a few to several hundred. Another popular approach based on the classical conjugate gradient method, known as the GMRES (Generalized Minimum Residual) algorithm is investigated. The GMRES algorithm was used in the past by a number of researchers for solving steady viscous and inviscid flow problems with considerable success. Here, the suitability of this algorithm is investigated for solving the system of nonlinear equations that arise in unsteady Navier-Stokes solvers at each time step. Unlike the Newton method which attempts to drive the error in the solution at each and every node down to zero, the GMRES algorithm only seeks to minimize the L2 norm of the error. In the GMRES algorithm the changes in the flow properties from one time step to the next are assumed to be the sum of a set of orthogonal vectors. By choosing the number of vectors to a reasonably small value N (between 5 and 20) the work required for advancing the solution from one time step to the next may be kept to (N+1) times that of a noniterative scheme. Many of the operations required by the GMRES algorithm such as matrix-vector multiplies, matrix additions and subtractions can all be vectorized and parallelized efficiently.
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Block iterative restoration of astronomical images with the massively parallel processor

NASA Technical Reports Server (NTRS)

Heap, Sara R.; Lindler, Don J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel iterative solution for h and p approximations of the shallow water equations

USGS Publications Warehouse

Barragy, E.J.; Walters, R.A.

1998-01-01

A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
Adaptive implicit-explicit and parallel element-by-element iteration schemes

NASA Technical Reports Server (NTRS)

Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.

1989-01-01

Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
A Two Colorable Fourth Order Compact Difference Scheme and Parallel Iterative Solution of the 3D Convection Diffusion Equation

NASA Technical Reports Server (NTRS)

Zhang, Jun; Ge, Lixin; Kouatchou, Jules

2000-01-01

A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.
Parallel solution of the symmetric tridiagonal eigenproblem. Research report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-10-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-01-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Solutions of large-scale electromagnetics problems involving dielectric objects with the parallel multilevel fast multipole algorithm.

PubMed

Ergül, Özgür

2011-11-01

Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moryakov, A. V., E-mail: sailor@orc.ru

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson–Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatterjee, Kausik, E-mail: kausik.chatterjee@aggiemail.usu.edu; Center for Atmospheric and Space Sciences, Utah State University, Logan, UT 84322; Roadcap, John R., E-mail: john.roadcap@us.af.mil

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson–Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals ofmore » the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.« less
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson-Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

NASA Astrophysics Data System (ADS)

Chatterjee, Kausik; Roadcap, John R.; Singh, Surendra

2014-11-01

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson-Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals of the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.
A highly parallel multigrid-like method for the solution of the Euler equations

NASA Technical Reports Server (NTRS)

Tuminaro, Ray S.

1989-01-01

We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*

PubMed Central

Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.

2012-01-01

This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Inversion of potential field data using the finite element method on parallel computers

NASA Astrophysics Data System (ADS)

Gross, L.; Altinay, C.; Shaw, S.

2015-11-01

In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.

Parallel computation of multigroup reactivity coefficient using iterative method

NASA Astrophysics Data System (ADS)

Susmikanti, Mike; Dewayatna, Winter

2013-09-01

One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

DTIC Science & Technology

1988-01-21

p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Development of parallel algorithms for electrical power management in space applications

NASA Technical Reports Server (NTRS)

Berry, Frederick C.

1989-01-01

The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Parallel AFSA algorithm accelerating based on MIC architecture

NASA Astrophysics Data System (ADS)

Zhou, Junhao; Xiao, Hong; Huang, Yifan; Li, Yongzhao; Xu, Yuanrui

2017-05-01

Analysis AFSA past for solving the traveling salesman problem, the algorithm efficiency is often a big problem, and the algorithm processing method, it does not fully responsive to the characteristics of the traveling salesman problem to deal with, and therefore proposes a parallel join improved AFSA process. The simulation with the current TSP known optimal solutions were analyzed, the results showed that the AFSA iterations improved less, on the MIC cards doubled operating efficiency, efficiency significantly.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel approach for bioinspired algorithms

NASA Astrophysics Data System (ADS)

Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina

2018-05-01

In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.

PubMed

Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul

2009-02-15

The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.
Reconstruction of coded aperture images

NASA Technical Reports Server (NTRS)

Bielefeld, Michael J.; Yin, Lo I.

1987-01-01

Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Multi-component Wronskian solution to the Kadomtsev-Petviashvili equation

NASA Astrophysics Data System (ADS)

Xu, Tao; Sun, Fu-Wei; Zhang, Yi; Li, Juan

2014-01-01

It is known that the Kadomtsev-Petviashvili (KP) equation can be decomposed into the first two members of the coupled Ablowitz-Kaup-Newell-Segur (AKNS) hierarchy by the binary non-linearization of Lax pairs. In this paper, we construct the N-th iterated Darboux transformation (DT) for the second- and third-order m-coupled AKNS systems. By using together the N-th iterated DT and Cramer's rule, we find that the KPII equation has the unreduced multi-component Wronskian solution and the KPI equation admits a reduced multi-component Wronskian solution. In particular, based on the unreduced and reduced two-component Wronskians, we obtain two families of fully-resonant line-soliton solutions which contain arbitrary numbers of asymptotic solitons as y → ∓∞ to the KPII equation, and the ordinary N-soliton solution to the KPI equation. In addition, we find that the KPI line solitons propagating in parallel can exhibit the bound state at the moment of collision.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai, Xiao-Chuan; Keyes, David; Yang, Chao

2014-09-29

The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less
Calculation of Moment Matrix Elements for Bilinear Quadrilaterals and Higher-Order Basis Functions

DTIC Science & Technology

2016-01-06

methods are known as boundary integral equation (BIE) methods and the present study falls into this category. The numerical solution of the BIE is...iterated integrals. The inner integral involves the product of the free-space Green’s function for the Helmholtz equation multiplied by an appropriate...Website: http://www.wipl-d.com/ 5. Y. Zhang and T. K. Sarkar, Parallel Solution of Integral Equation -Based EM Problems in the Frequency Domain. New
An Efficient Algorithm for Perturbed Orbit Integration Combining Analytical Continuation and Modified Chebyshev Picard Iteration

NASA Astrophysics Data System (ADS)

Elgohary, T.; Kim, D.; Turner, J.; Junkins, J.

2014-09-01

Several methods exist for integrating the motion in high order gravity fields. Some recent methods use an approximate starting orbit, and an efficient method is needed for generating warm starts that account for specific low order gravity approximations. By introducing two scalar Lagrange-like invariants and employing Leibniz product rule, the perturbed motion is integrated by a novel recursive formulation. The Lagrange-like invariants allow exact arbitrary order time derivatives. Restricting attention to the perturbations due to the zonal harmonics J2 through J6, we illustrate an idea. The recursively generated vector-valued time derivatives for the trajectory are used to develop a continuation series-based solution for propagating position and velocity. Numerical comparisons indicate performance improvements of ~ 70X over existing explicit Runge-Kutta methods while maintaining mm accuracy for the orbit predictions. The Modified Chebyshev Picard Iteration (MCPI) is an iterative path approximation method to solve nonlinear ordinary differential equations. The MCPI utilizes Picard iteration with orthogonal Chebyshev polynomial basis functions to recursively update the states. The key advantages of the MCPI are as follows: 1) Large segments of a trajectory can be approximated by evaluating the forcing function at multiple nodes along the current approximation during each iteration. 2) It can readily handle general gravity perturbations as well as non-conservative forces. 3) Parallel applications are possible. The Picard sequence converges to the solution over large time intervals when the forces are continuous and differentiable. According to the accuracy of the starting solutions, however, the MCPI may require significant number of iterations and function evaluations compared to other integrators. In this work, we provide an efficient methodology to establish good starting solutions from the continuation series method; this warm start improves the performance of the MCPI significantly and will likely be useful for other applications where efficiently computed approximate orbit solutions are needed.

Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

DOE PAGES

Bunting, Gregory; Prakash, Arun; Walsh, Timothy; ...

2018-01-26

Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bunting, Gregory; Prakash, Arun; Walsh, Timothy

Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.
Modified Chebyshev Picard Iteration for Efficient Numerical Integration of Ordinary Differential Equations

NASA Astrophysics Data System (ADS)

Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.

2013-09-01

Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.
Krylov subspace methods on supercomputers

NASA Technical Reports Server (NTRS)

Saad, Youcef

1988-01-01

A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.
Reducing neural network training time with parallel processing

NASA Technical Reports Server (NTRS)

Rogers, James L., Jr.; Lamarsh, William J., II

1995-01-01

Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.
SPIRiT: Iterative Self-consistent Parallel Imaging Reconstruction from Arbitrary k-Space

PubMed Central

Lustig, Michael; Pauly, John M.

2010-01-01

A new approach to autocalibrating, coil-by-coil parallel imaging reconstruction is presented. It is a generalized reconstruction framework based on self consistency. The reconstruction problem is formulated as an optimization that yields the most consistent solution with the calibration and acquisition data. The approach is general and can accurately reconstruct images from arbitrary k-space sampling patterns. The formulation can flexibly incorporate additional image priors such as off-resonance correction and regularization terms that appear in compressed sensing. Several iterative strategies to solve the posed reconstruction problem in both image and k-space domain are presented. These are based on a projection over convex sets (POCS) and a conjugate gradient (CG) algorithms. Phantom and in-vivo studies demonstrate efficient reconstructions from undersampled Cartesian and spiral trajectories. Reconstructions that include off-resonance correction and nonlinear ℓ1-wavelet regularization are also demonstrated. PMID:20665790
Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

NASA Astrophysics Data System (ADS)

Yang, Xiang; Mittal, Rajat

2017-03-01

In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

NASA Astrophysics Data System (ADS)

Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia

2013-08-01

Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.

NASA Astrophysics Data System (ADS)

Liu, Y.; Li, Y.

2016-12-01

We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Hixon, Duane; Sankar, L. N.

1993-01-01

During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
A parallel variable metric optimization algorithm

NASA Technical Reports Server (NTRS)

Straeter, T. A.

1973-01-01

An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanford, M.

1997-12-31

Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less

Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi; Hixon, Duane

1993-01-01

The work done under this project was documented in detail as the Ph. D. dissertation of Dr. Duane Hixon. The objectives of the research project were evaluation of the generalized minimum residual method (GMRES) as a tool for accelerating 2-D and 3-D unsteady flows and evaluation of the suitability of the GMRES algorithm for unsteady flows, computed on parallel computer architectures.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

PubMed Central

Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

2014-01-01

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Scaling Optimization of the SIESTA MHD Code

NASA Astrophysics Data System (ADS)

Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

2013-10-01

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Iterative Importance Sampling Algorithms for Parameter Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
A Parallel Fast Sweeping Method for the Eikonal Equation

NASA Astrophysics Data System (ADS)

Baker, B.

2017-12-01

Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.
Solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators.

PubMed

Zhao, Jing; Zong, Haili

2018-01-01

In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.
Iterative algorithms for large sparse linear systems on parallel computers

NASA Technical Reports Server (NTRS)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
A fast iterative scheme for the linearized Boltzmann equation

NASA Astrophysics Data System (ADS)

Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.

2017-06-01

Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference between these results and those using the hard-sphere potential is discussed.
Solution of partial differential equations on vector and parallel computers

NASA Technical Reports Server (NTRS)

Ortega, J. M.; Voigt, R. G.

1985-01-01

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saad, Yousef

2014-01-16

The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less
Approximate optimal guidance for the advanced launch system

NASA Technical Reports Server (NTRS)

Feeley, T. S.; Speyer, J. L.

1993-01-01

A real-time guidance scheme for the problem of maximizing the payload into orbit subject to the equations of motion for a rocket over a spherical, non-rotating earth is presented. An approximate optimal launch guidance law is developed based upon an asymptotic expansion of the Hamilton - Jacobi - Bellman or dynamic programming equation. The expansion is performed in terms of a small parameter, which is used to separate the dynamics of the problem into primary and perturbation dynamics. For the zeroth-order problem the small parameter is set to zero and a closed-form solution to the zeroth-order expansion term of Hamilton - Jacobi - Bellman equation is obtained. Higher-order terms of the expansion include the effects of the neglected perturbation dynamics. These higher-order terms are determined from the solution of first-order linear partial differential equations requiring only the evaluation of quadratures. This technique is preferred as a real-time, on-line guidance scheme to alternative numerical iterative optimization schemes because of the unreliable convergence properties of these iterative guidance schemes and because the quadratures needed for the approximate optimal guidance law can be performed rapidly and by parallel processing. Even if the approximate solution is not nearly optimal, when using this technique the zeroth-order solution always provides a path which satisfies the terminal constraints. Results for two-degree-of-freedom simulations are presented for the simplified problem of flight in the equatorial plane and compared to the guidance scheme generated by the shooting method which is an iterative second-order technique.
A finite element solver for 3-D compressible viscous flows

NASA Technical Reports Server (NTRS)

Reddy, K. C.; Reddy, J. N.; Nayani, S.

1990-01-01

Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Twostep-by-twostep PIRK-type PC methods with continuous output formulas

NASA Astrophysics Data System (ADS)

Cong, Nguyen Huu; Xuan, Le Ngoc

2008-11-01

This paper deals with parallel predictor-corrector (PC) iteration methods based on collocation Runge-Kutta (RK) corrector methods with continuous output formulas for solving nonstiff initial-value problems (IVPs) for systems of first-order differential equations. At nth step, the continuous output formulas are used not only for predicting the stage values in the PC iteration methods but also for calculating the step values at (n+2)th step. In this case, the integration processes can be proceeded twostep-by-twostep. The resulting twostep-by-twostep (TBT) parallel-iterated RK-type (PIRK-type) methods with continuous output formulas (twostep-by-twostep PIRKC methods or TBTPIRKC methods) give us a faster integration process. Fixed stepsize applications of these TBTPIRKC methods to a few widely-used test problems reveal that the new PC methods are much more efficient when compared with the well-known parallel-iterated RK methods (PIRK methods), parallel-iterated RK-type PC methods with continuous output formulas (PIRKC methods) and sequential explicit RK codes DOPRI5 and DOP853 available from the literature.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS

PubMed Central

Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.

2014-01-01

Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
A conjugate gradient method for solving the non-LTE line radiation transfer problem

NASA Astrophysics Data System (ADS)

Paletou, F.; Anterrieu, E.

2009-12-01

This study concerns the fast and accurate solution of the line radiation transfer problem, under non-LTE conditions. We propose and evaluate an alternative iterative scheme to the classical ALI-Jacobi method, and to the more recently proposed Gauss-Seidel and successive over-relaxation (GS/SOR) schemes. Our study is indeed based on applying a preconditioned bi-conjugate gradient method (BiCG-P). Standard tests, in 1D plane parallel geometry and in the frame of the two-level atom model with monochromatic scattering are discussed. Rates of convergence between the previously mentioned iterative schemes are compared, as are their respective timing properties. The smoothing capability of the BiCG-P method is also demonstrated.

An asymptotic-preserving Lagrangian algorithm for the time-dependent anisotropic heat transport equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.

2014-09-01

We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE PAGES

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

2015-12-01

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses

NASA Astrophysics Data System (ADS)

Novak, Roman; Vencelj, Matja¿

2009-12-01

The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.
Resistive MHD Stability Analysis in Near Real-time

NASA Astrophysics Data System (ADS)

Glasser, Alexander; Kolemen, Egemen

2017-10-01

We discuss the feasibility of a near real-time calculation of the tokamak Δ' matrix, which summarizes MHD stability to resistive modes, such as tearing and interchange modes. As the operational phase of ITER approaches, solutions for active feedback tokamak stability control are needed. It has been previously demonstrated that an ideal MHD stability analysis is achievable on a sub- O (1 s) timescale, as is required to control phenomena comparable with the MHD-evolution timescale of ITER. In the present work, we broaden this result to incorporate the effects of resistive MHD modes. Such modes satisfy ideal MHD equations in regions outside narrow resistive layers that form at singular surfaces. We demonstrate that the use of asymptotic expansions at the singular surfaces, as well as the application of state transition matrices, enable a fast, parallelized solution to the singular outer layer boundary value problem, and thereby rapidly compute Δ'. Sponsored by US DOE under DE-SC0015878 and DE-FC02-04ER54698.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy.

PubMed

Zelyak, O; Fallone, B G; St-Aubin, J

2017-12-14

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy

NASA Astrophysics Data System (ADS)

Zelyak, O.; Fallone, B. G.; St-Aubin, J.

2018-01-01

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.
Corrigendum to "Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy".

PubMed

Zelyak, Oleksandr; Fallone, B Gino; St-Aubin, Joel

2018-03-12

Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation. © 2018 Institute of Physics and Engineering in Medicine.
A wavelet approach to binary blackholes with asynchronous multitasking

NASA Astrophysics Data System (ADS)

Lim, Hyun; Hirschmann, Eric; Neilsen, David; Anderson, Matthew; Debuhr, Jackson; Zhang, Bo

2016-03-01

Highly accurate simulations of binary black holes and neutron stars are needed to address a variety of interesting problems in relativistic astrophysics. We present a new method for the solving the Einstein equations (BSSN formulation) using iterated interpolating wavelets. Wavelet coefficients provide a direct measure of the local approximation error for the solution and place collocation points that naturally adapt to features of the solution. Further, they exhibit exponential convergence on unevenly spaced collection points. The parallel implementation of the wavelet simulation framework presented here deviates from conventional practice in combining multi-threading with a form of message-driven computation sometimes referred to as asynchronous multitasking.
Acceleration of linear stationary iterative processes in multiprocessor computers. II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romm, Ya.E.

1982-05-01

For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less
State Transition Matrix for Perturbed Orbital Motion Using Modified Chebyshev Picard Iteration

NASA Astrophysics Data System (ADS)

Read, Julie L.; Younes, Ahmad Bani; Macomber, Brent; Turner, James; Junkins, John L.

2015-06-01

The Modified Chebyshev Picard Iteration (MCPI) method has recently proven to be highly efficient for a given accuracy compared to several commonly adopted numerical integration methods, as a means to solve for perturbed orbital motion. This method utilizes Picard iteration, which generates a sequence of path approximations, and Chebyshev Polynomials, which are orthogonal and also enable both efficient and accurate function approximation. The nodes consistent with discrete Chebyshev orthogonality are generated using cosine sampling; this strategy also reduces the Runge effect and as a consequence of orthogonality, there is no matrix inversion required to find the basis function coefficients. The MCPI algorithms considered herein are parallel-structured so that they are immediately well-suited for massively parallel implementation with additional speedup. MCPI has a wide range of applications beyond ephemeris propagation, including the propagation of the State Transition Matrix (STM) for perturbed two-body motion. A solution is achieved for a spherical harmonic series representation of earth gravity (EGM2008), although the methodology is suitable for application to any gravity model. Included in this representation the normalized, Associated Legendre Functions are given and verified numerically. Modifications of the classical algorithm techniques, such as rewriting the STM equations in a second-order cascade formulation, gives rise to additional speedup. Timing results for the baseline formulation and this second-order formulation are given.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Visualization of Unsteady Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1997-01-01

The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
Regularization and computational methods for precise solution of perturbed orbit transfer problems

NASA Astrophysics Data System (ADS)

Woollands, Robyn Michele

The author has developed a suite of algorithms for solving the perturbed Lambert's problem in celestial mechanics. These algorithms have been implemented as a parallel computation tool that has broad applicability. This tool is composed of four component algorithms and each provides unique benefits for solving a particular type of orbit transfer problem. The first one utilizes a Keplerian solver (a-iteration) for solving the unperturbed Lambert's problem. This algorithm not only provides a "warm start" for solving the perturbed problem but is also used to identify which of several perturbed solvers is best suited for the job. The second algorithm solves the perturbed Lambert's problem using a variant of the modified Chebyshev-Picard iteration initial value solver that solves two-point boundary value problems. This method converges over about one third of an orbit and does not require a Newton-type shooting method and thus no state transition matrix needs to be computed. The third algorithm makes use of regularization of the differential equations through the Kustaanheimo-Stiefel transformation and extends the domain of convergence over which the modified Chebyshev-Picard iteration two-point boundary value solver will converge, from about one third of an orbit to almost a full orbit. This algorithm also does not require a Newton-type shooting method. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver to solve the perturbed two-impulse Lambert problem over multiple revolutions. The method of particular solutions is a shooting method but differs from the Newton-type shooting methods in that it does not require integration of the state transition matrix. The mathematical developments that underlie these four algorithms are derived in the chapters of this dissertation. For each of the algorithms, some orbit transfer test cases are included to provide insight on accuracy and efficiency of these individual algorithms. Following this discussion, the combined parallel algorithm, known as the unified Lambert tool, is presented and an explanation is given as to how it automatically selects which of the three perturbed solvers to compute the perturbed solution for a particular orbit transfer. The unified Lambert tool may be used to determine a single orbit transfer or for generating of an extremal field map. A case study is presented for a mission that is required to rendezvous with two pieces of orbit debris (spent rocket boosters). The unified Lambert tool software developed in this dissertation is already being utilized by several industrial partners and we are confident that it will play a significant role in practical applications, including solution of Lambert problems that arise in the current applications focused on enhanced space situational awareness.
AZTEC: A parallel iterative package for the solving linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1996-12-31

We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

2000-01-01

Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency issues in the GA, it is possible to have idle processors. However, as long as the load at each processing node is similar, the processors are kept busy nearly all of the time. In applying GAs to circuit design, a suitable genetic representation 'is that of a circuit-construction program. We discuss one such circuit-construction programming language and show how evolution can generate useful analog circuit designs. This language has the desirable property that virtually all sets of combinations of primitives result in valid circuit graphs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. Using a parallel genetic algorithm and circuit simulation software, we present experimental results as applied to three analog filter and two amplifier design tasks. For example, a figure shows an 85 dB amplifier design evolved by our system, and another figure shows the performance of that circuit (gain and frequency response). In all tasks, our system is able to generate circuits that achieve the target specifications.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.

Organizing Compression of Hyperspectral Imagery to Allow Efficient Parallel Decompression

NASA Technical Reports Server (NTRS)

Klimesh, Matthew A.; Kiely, Aaron B.

2014-01-01

family of schemes has been devised for organizing the output of an algorithm for predictive data compression of hyperspectral imagery so as to allow efficient parallelization in both the compressor and decompressor. In these schemes, the compressor performs a number of iterations, during each of which a portion of the data is compressed via parallel threads operating on independent portions of the data. The general idea is that for each iteration it is predetermined how much compressed data will be produced from each thread.
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

NASA Astrophysics Data System (ADS)

Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

2018-03-01

This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.
Parallel-vector computation for structural analysis and nonlinear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1990-01-01

Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration

NASA Astrophysics Data System (ADS)

Zhang, Y.; Key, K.; Ovall, J.; Holst, M.

2014-12-01

We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
Mitigation of Parallel RF Potentials by an Appropriate Antenna Design Using TOPICA

NASA Astrophysics Data System (ADS)

Maggiora, R.; Milanesio, D.

2011-12-01

A substantial effort has been devoted in recent years to the optimization of the ITER Ion Cyclotron (IC) launcher [1], above all with the aim of maximizing the coupling performances of the antenna; good improvements have been documented by using TOPICA code [2], a predictive tool for the design and optimization of RF launchers in front of a plasma region. Despite the progresses in the mentioned topic, this is not the only issue related to the design of IC antennas: a second crucial aspect is the impurities production, which is driven by the parallel RF potentials generated by the antenna itself and by the surrounding structures. The goal of this work is to analyze a set of innovative solutions that could be implemented in the next generation of IC antennas in order to mitigate the parallel RF potentials without reducing the power delivered to plasma. To achieve this challenging task, the TOPICA code has been adopted, taking advantage of recently introduced features. In particular, the code permits to compute the electric field distribution everywhere inside the antenna enclosure and in the plasma column, allowing to determine not only the magnitude and shape of the fields in front of the antenna, but also to evaluate their radial decay. Provided the electric field map, it is then possible to determine the parallel RF potentials and, even more important, to directly verify the impact of geometrical modifications of the front elements of the antenna on the RF potentials themselves. Furthermore, the capability to simulate the full 3D antenna with a high geometrical accuracy (as the one provided by commercial codes) and to account for an accurate plasma model indicates in TOPICA code a perfect candidate for this specific task. To lower the parallel RF potentials, two complementary approaches are outlined in the paper: the first one acts on the reduction of the electric field values, the second works on the minimization of the geometrical asymmetries. Pros and cons of the adopted solutions are discussed in detail. Two realistic cases have been taken into account in this work. Firstly, an ITER-like IC launcher has been adopted as a reference and optimized, then few solutions have been proposed for the ASDEX Upgrade experiment, with the final goal of testing the most promising concept for the machine in the coming years. This second activity has been carried out in collaboration with IPP-Garching and ENEA-Frascati.
Computational aspects of helicopter trim analysis and damping levels from Floquet theory

NASA Technical Reports Server (NTRS)

Gaonkar, Gopal H.; Achar, N. S.

1992-01-01

Helicopter trim settings of periodic initial state and control inputs are investigated for convergence of Newton iteration in computing the settings sequentially and in parallel. The trim analysis uses a shooting method and a weak version of two temporal finite element methods with displacement formulation and with mixed formulation of displacements and momenta. These three methods broadly represent two main approaches of trim analysis: adaptation of initial-value and finite element boundary-value codes to periodic boundary conditions, particularly for unstable and marginally stable systems. In each method, both the sequential and in-parallel schemes are used and the resulting nonlinear algebraic equations are solved by damped Newton iteration with an optimally selected damping parameter. The impact of damped Newton iteration, including earlier-observed divergence problems in trim analysis, is demonstrated by the maximum condition number of the Jacobian matrices of the iterative scheme and by virtual elimination of divergence. The advantages of the in-parallel scheme over the conventional sequential scheme are also demonstrated.
Project APhiD: A Lorenz-gauged A-Φ decomposition for parallelized computation of ultra-broadband electromagnetic induction in a fully heterogeneous Earth

NASA Astrophysics Data System (ADS)

Weiss, Chester J.

2013-08-01

An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
Glenn-ht/bem Conjugate Heat Transfer Solver for Large-scale Turbomachinery Models

NASA Technical Reports Server (NTRS)

Divo, E.; Steinthorsson, E.; Rodriquez, F.; Kassab, A. J.; Kapat, J. S.; Heidmann, James D. (Technical Monitor)

2003-01-01

A coupled Boundary Element/Finite Volume Method temperature-forward/flux-hack algorithm is developed for conjugate heat transfer (CHT) applications. A loosely coupled strategy is adopted with each field solution providing boundary conditions for the other in an iteration seeking continuity of temperature and heat flux at the fluid-solid interface. The NASA Glenn Navier-Stokes code Glenn-HT is coupled to a 3-D BEM steady state heat conduction code developed at the University of Central Florida. Results from CHT simulation of a 3-D film-cooled blade section are presented and compared with those computed by a two-temperature approach. Also presented are current developments of an iterative domain decomposition strategy accommodating large numbers of unknowns in the BEM. The blade is artificially sub-sectioned in the span-wise direction, 3-D BEM solutions are obtained in the subdomains, and interface temperatures are averaged symmetrically when the flux is updated while the fluxes are averaged anti-symmetrically to maintain continuity of heat flux when the temperatures are updated. An initial guess for interface temperatures uses a physically-based 1-D conduction argument to provide an effective starting point and significantly reduce iteration. 2-D and 3-D results show the process converges efficiently and offers substantial computational and storage savings. Future developments include a parallel multi-grid implementation of the approach under MPI for computation on PC clusters.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1991-01-01

Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
Exploiting parallel computing with limited program changes using a network of microcomputers

NASA Technical Reports Server (NTRS)

Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

1985-01-01

Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.
Adagio 4.20 User’s Guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Spencer, Benjamin Whiting; Crane, Nathan K.; Heinstein, Martin W.

2011-03-01

Adagio is a Lagrangian, three-dimensional, implicit code for the analysis of solids and structures. It uses a multi-level iterative solver, which enables it to solve problems with large deformations, nonlinear material behavior, and contact. It also has a versatile library of continuum and structural elements, and an extensive library of material models. Adagio is written for parallel computing environments, and its solvers allow for scalable solutions of very large problems. Adagio uses the SIERRA Framework, which allows for coupling with other SIERRA mechanics codes. This document describes the functionality and input structure for Adagio.
An iterative reduced field-of-view reconstruction for periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI.

PubMed

Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J

2015-10-01

To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p < 0.001). In vivo study validated the increased signal-to-noise ratio, which is over four times higher than with density compensation. Image sharpness index was improved using the regularized reconstruction implemented. The rFOV strategy permits near real-time iterative reconstruction to improve the image quality of PROPELLER images. Substantial improvements in image quality metrics were validated in the experiments. The concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.
Region of interest processing for iterative reconstruction in x-ray computed tomography

NASA Astrophysics Data System (ADS)

Kopp, Felix K.; Nasirudin, Radin A.; Mei, Kai; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Noël, Peter B.

2015-03-01

The recent advancements in the graphics card technology raised the performance of parallel computing and contributed to the introduction of iterative reconstruction methods for x-ray computed tomography in clinical CT scanners. Iterative maximum likelihood (ML) based reconstruction methods are known to reduce image noise and to improve the diagnostic quality of low-dose CT. However, iterative reconstruction of a region of interest (ROI), especially ML based, is challenging. But for some clinical procedures, like cardiac CT, only a ROI is needed for diagnostics. A high-resolution reconstruction of the full field of view (FOV) consumes unnecessary computation effort that results in a slower reconstruction than clinically acceptable. In this work, we present an extension and evaluation of an existing ROI processing algorithm. Especially improvements for the equalization between regions inside and outside of a ROI are proposed. The evaluation was done on data collected from a clinical CT scanner. The performance of the different algorithms is qualitatively and quantitatively assessed. Our solution to the ROI problem provides an increase in signal-to-noise ratio and leads to visually less noise in the final reconstruction. The reconstruction speed of our technique was observed to be comparable with other previous proposed techniques. The development of ROI processing algorithms in combination with iterative reconstruction will provide higher diagnostic quality in the near future.
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.

To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control

DOE PAGES

Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.

2018-03-26

To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less
Distorted Born iterative T-matrix method for inversion of CSEM data in anisotropic media

NASA Astrophysics Data System (ADS)

Jakobsen, Morten; Tveit, Svenn

2018-05-01

We present a direct iterative solutions to the nonlinear controlled-source electromagnetic (CSEM) inversion problem in the frequency domain, which is based on a volume integral equation formulation of the forward modelling problem in anisotropic conductive media. Our vectorial nonlinear inverse scattering approach effectively replaces an ill-posed nonlinear inverse problem with a series of linear ill-posed inverse problems, for which there already exist efficient (regularized) solution methods. The solution update the dyadic Green's function's from the source to the scattering-volume and from the scattering-volume to the receivers, after each iteration. The T-matrix approach of multiple scattering theory is used for efficient updating of all dyadic Green's functions after each linearized inversion step. This means that we have developed a T-matrix variant of the Distorted Born Iterative (DBI) method, which is often used in the acoustic and electromagnetic (medical) imaging communities as an alternative to contrast-source inversion. The main advantage of using the T-matrix approach in this context, is that it eliminates the need to perform a full forward simulation at each iteration of the DBI method, which is known to be consistent with the Gauss-Newton method. The T-matrix allows for a natural domain decomposition, since in the sense that a large model can be decomposed into an arbitrary number of domains that can be treated independently and in parallel. The T-matrix we use for efficient model updating is also independent of the source-receiver configuration, which could be an advantage when performing fast-repeat modelling and time-lapse inversion. The T-matrix is also compatible with the use of modern renormalization methods that can potentially help us to reduce the sensitivity of the CSEM inversion results on the starting model. To illustrate the performance and potential of our T-matrix variant of the DBI method for CSEM inversion, we performed a numerical experiments based on synthetic CSEM data associated with 2D VTI and 3D orthorombic model inversions. The results of our numerical experiment suggest that the DBIT method for inversion of CSEM data in anisotropic media is both accurate and efficient.
An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

NASA Astrophysics Data System (ADS)

Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

2015-07-01

This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less

Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

NASA Astrophysics Data System (ADS)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE PAGES

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...

2017-06-29

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
An analytically iterative method for solving problems of cosmic-ray modulation

NASA Astrophysics Data System (ADS)

Kolesnyk, Yuriy L.; Bobik, Pavol; Shakhov, Boris A.; Putis, Marian

2017-09-01

The development of an analytically iterative method for solving steady-state as well as unsteady-state problems of cosmic-ray (CR) modulation is proposed. Iterations for obtaining the solutions are constructed for the spherically symmetric form of the CR propagation equation. The main solution of the considered problem consists of the zero-order solution that is obtained during the initial iteration and amendments that may be obtained by subsequent iterations. The finding of the zero-order solution is based on the CR isotropy during propagation in the space, whereas the anisotropy is taken into account when finding the next amendments. To begin with, the method is applied to solve the problem of CR modulation where the diffusion coefficient κ and the solar wind speed u are constants with an Local Interstellar Spectra (LIS) spectrum. The solution obtained with two iterations was compared with an analytical solution and with numerical solutions. Finally, solutions that have only one iteration for two problems of CR modulation with u = constant and the same form of LIS spectrum were obtained and tested against numerical solutions. For the first problem, κ is proportional to the momentum of the particle p, so it has the form κ = k0η, where η =p/m_0c. For the second problem, the diffusion coefficient is given in the form κ = k0βη, where β =v/c is the particle speed relative to the speed of light. There was a good matching of the obtained solutions with the numerical solutions as well as with the analytical solution for the problem where κ = constant.
A parallel algorithm for 2D visco-acoustic frequency-domain full-waveform inversion: application to a dense OBS data set

NASA Astrophysics Data System (ADS)

Sourbier, F.; Operto, S.; Virieux, J.

2006-12-01

We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widlund, Olof B.

2015-06-09

The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Towards the Experimental Assessment of the DQE in SPECT Scanners

NASA Astrophysics Data System (ADS)

Fountos, G. P.; Michail, C. M.

2017-11-01

The purpose of this work was to introduce the Detective Quantum Efficiency (DQE) in single photon emission computed tomography (SPECT) systems using a flood source. A Tc-99m-based flood source (Eγ = 140 keV) consisting of a radiopharmaceutical solution of dithiothreitol (DTT, 10-3 M)/Tc-99m(III)-DMSA, 40 mCi/40 ml bound to the grains of an Agfa MammoRay HDR Medical X-ray film) was prepared in laboratory. The source was placed between two PMMA blocks and images were obtained by using the brain tomographic acquisition protocol (DatScan-brain). The Modulation Transfer Function (MTF) was evaluated using the Iterative 2D algorithm. All imaging experiments were performed in a Siemens e-Cam gamma camera. The Normalized Noise Power spectra (NNPS) were obtained from the sagittal views of the source. The higher MTF values were obtained for the Flash Iterative 2D with 24 iterations and 20 subsets. The noise levels of the SPECT reconstructed images, in terms of the NNPS, were found to increase as the number of iterations increase. The behavior of the DQE was influenced by both MTF and NNPS. As the number of iterations was increased, higher MTF values were obtained, however with a parallel, increase of magnitude in image noise, as depicted from the NNPS results. DQE values, which were influenced by both MTF and NNPS, were found higher when the number of iterations results in resolution saturation. The method presented here is novel and easy to implement, requiring materials commonly found in clinical practice and can be useful in the quality control of SPECT scanners.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1990-01-01

Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments.

PubMed

Fisicaro, G; Genovese, L; Andreussi, O; Marzari, N; Goedecker, S

2016-01-07

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.
A generalized Poisson and Poisson-Boltzmann solver for electrostatic environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fisicaro, G., E-mail: giuseppe.fisicaro@unibas.ch; Goedecker, S.; Genovese, L.

2016-01-07

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of applied electrochemical potentials, taking into account the non-trivial electrostatic screening coming from the solvent and the electrolytes. As a consequence, the electrostatic potential has to be found by solving the generalized Poisson and the Poisson-Boltzmann equations for neutral and ionic solutions, respectively. In the present work, solvers for both problems have been developed. A preconditioned conjugate gradient method has been implemented for the solution of the generalized Poisson equation and themore » linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations of the ordinary Poisson equation solver. In addition, a self-consistent procedure enables us to solve the non-linear Poisson-Boltzmann problem. Both solvers exhibit very high accuracy and parallel efficiency and allow for the treatment of periodic, free, and slab boundary conditions. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and will be released as an independent program, suitable for integration in other codes.« less
Numerical Solutions for Nonlinear High Damping Rubber Bearing Isolators: Newmark's Method with Netwon-Raphson Iteration Revisited

NASA Astrophysics Data System (ADS)

Markou, A. A.; Manolis, G. D.

2018-03-01

Numerical methods for the solution of dynamical problems in engineering go back to 1950. The most famous and widely-used time stepping algorithm was developed by Newmark in 1959. In the present study, for the first time, the Newmark algorithm is developed for the case of the trilinear hysteretic model, a model that was used to describe the shear behaviour of high damping rubber bearings. This model is calibrated against free-vibration field tests implemented on a hybrid base isolated building, namely the Solarino project in Italy, as well as against laboratory experiments. A single-degree-of-freedom system is used to describe the behaviour of a low-rise building isolated with a hybrid system comprising high damping rubber bearings and low friction sliding bearings. The behaviour of the high damping rubber bearings is simulated by the trilinear hysteretic model, while the description of the behaviour of the low friction sliding bearings is modeled by a linear Coulomb friction model. In order to prove the effectiveness of the numerical method we compare the analytically solved trilinear hysteretic model calibrated from free-vibration field tests (Solarino project) against the same model solved with the Newmark method with Netwon-Raphson iteration. Almost perfect agreement is observed between the semi-analytical solution and the fully numerical solution with Newmark's time integration algorithm. This will allow for extension of the trilinear mechanical models to bidirectional horizontal motion, to time-varying vertical loads, to multi-degree-of-freedom-systems, as well to generalized models connected in parallel, where only numerical solutions are possible.
Image segmentation by iterative parallel region growing with application to data compression and image analysis

NASA Technical Reports Server (NTRS)

Tilton, James C.

1988-01-01

Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

NASA Astrophysics Data System (ADS)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU-based algorithms based on existing parallelization strategies.
CAD-Based Shielding Analysis for ITER Port Diagnostics

NASA Astrophysics Data System (ADS)

Serikov, Arkady; Fischer, Ulrich; Anthoine, David; Bertalot, Luciano; De Bock, Maartin; O'Connor, Richard; Juarez, Rafael; Krasilnikov, Vitaly

2017-09-01

Radiation shielding analysis conducted in support of design development of the contemporary diagnostic systems integrated inside the ITER ports is relied on the use of CAD models. This paper presents the CAD-based MCNP Monte Carlo radiation transport and activation analyses for the Diagnostic Upper and Equatorial Port Plugs (UPP #3 and EPP #8, #17). The creation process of the complicated 3D MCNP models of the diagnostics systems was substantially accelerated by application of the CAD-to-MCNP converter programs MCAM and McCad. High performance computing resources of the Helios supercomputer allowed to speed-up the MCNP parallel transport calculations with the MPI/OpenMP interface. The found shielding solutions could be universal, reducing ports R&D costs. The shield block behind the Tritium and Deposit Monitor (TDM) optical box was added to study its influence on Shut-Down Dose Rate (SDDR) in Port Interspace (PI) of EPP#17. Influence of neutron streaming along the Lost Alpha Monitor (LAM) on the neutron energy spectra calculated in the Tangential Neutron Spectrometer (TNS) of EPP#8. For the UPP#3 with Charge eXchange Recombination Spectroscopy (CXRS-core), an excessive neutron streaming along the CXRS shutter, which should be prevented in further design iteration.
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

NASA Astrophysics Data System (ADS)

Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

2017-07-01

Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Multidisciplinary systems optimization by linear decomposition

NASA Technical Reports Server (NTRS)

Sobieski, J.

1984-01-01

In a typical design process major decisions are made sequentially. An illustrated example is given for an aircraft design in which the aerodynamic shape is usually decided first, then the airframe is sized for strength and so forth. An analogous sequence could be laid out for any other major industrial product, for instance, a ship. The loops in the discipline boxes symbolize iterative design improvements carried out within the confines of a single engineering discipline, or subsystem. The loops spanning several boxes depict multidisciplinary design improvement iterations. Omitted for graphical simplicity is parallelism of the disciplinary subtasks. The parallelism is important in order to develop a broad workfront necessary to shorten the design time. If all the intradisciplinary and interdisciplinary iterations were carried out to convergence, the process could yield a numerically optimal design. However, it usually stops short of that because of time and money limitations. This is especially true for the interdisciplinary iterations.
FAST TRACK PAPER: Non-iterative multiple-attenuation methods: linear inverse solutions to non-linear inverse problems - II. BMG approximation

NASA Astrophysics Data System (ADS)

Ikelle, Luc T.; Osen, Are; Amundsen, Lasse; Shen, Yunqing

2004-12-01

The classical linear solutions to the problem of multiple attenuation, like predictive deconvolution, τ-p filtering, or F-K filtering, are generally fast, stable, and robust compared to non-linear solutions, which are generally either iterative or in the form of a series with an infinite number of terms. These qualities have made the linear solutions more attractive to seismic data-processing practitioners. However, most linear solutions, including predictive deconvolution or F-K filtering, contain severe assumptions about the model of the subsurface and the class of free-surface multiples they can attenuate. These assumptions limit their usefulness. In a recent paper, we described an exception to this assertion for OBS data. We showed in that paper that a linear and non-iterative solution to the problem of attenuating free-surface multiples which is as accurate as iterative non-linear solutions can be constructed for OBS data. We here present a similar linear and non-iterative solution for attenuating free-surface multiples in towed-streamer data. For most practical purposes, this linear solution is as accurate as the non-linear ones.
Unweighted least squares phase unwrapping by means of multigrid techniques

NASA Astrophysics Data System (ADS)

Pritt, Mark D.

1995-11-01

We present a multigrid algorithm for unweighted least squares phase unwrapping. This algorithm applies Gauss-Seidel relaxation schemes to solve the Poisson equation on smaller, coarser grids and transfers the intermediate results to the finer grids. This approach forms the basis of our multigrid algorithm for weighted least squares phase unwrapping, which is described in a separate paper. The key idea of our multigrid approach is to maintain the partial derivatives of the phase data in separate arrays and to correct these derivatives at the boundaries of the coarser grids. This maintains the boundary conditions necessary for rapid convergence to the correct solution. Although the multigrid algorithm is an iterative algorithm, we demonstrate that it is nearly as fast as the direct Fourier-based method. We also describe how to parallelize the algorithm for execution on a distributed-memory parallel processor computer or a network-cluster of workstations.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

NASA Astrophysics Data System (ADS)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.
Application of the perturbation iteration method to boundary layer type problems.

PubMed

Pakdemirli, Mehmet

2016-01-01

The recently developed perturbation iteration method is applied to boundary layer type singular problems for the first time. As a preliminary work on the topic, the simplest algorithm of PIA(1,1) is employed in the calculations. Linear and nonlinear problems are solved to outline the basic ideas of the new solution technique. The inner and outer solutions are determined with the iteration algorithm and matched to construct a composite expansion valid within all parts of the domain. The solutions are contrasted with the available exact or numerical solutions. It is shown that the perturbation-iteration algorithm can be effectively used for solving boundary layer type problems.
Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory

NASA Astrophysics Data System (ADS)

Jia, Weile; Lin, Lin

2017-10-01

Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.
Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory.

PubMed

Jia, Weile; Lin, Lin

2017-10-14

Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.
Analysis and Design of ITER 1 MV Core Snubber

NASA Astrophysics Data System (ADS)

Wang, Haitian; Li, Ge

2012-11-01

The core snubber, as a passive protection device, can suppress arc current and absorb stored energy in stray capacitance during the electrical breakdown in accelerating electrodes of ITER NBI. In order to design the core snubber of ITER, the control parameters of the arc peak current have been firstly analyzed by the Fink-Baker-Owren (FBO) method, which are used for designing the DIIID 100 kV snubber. The B-H curve can be derived from the measured voltage and current waveforms, and the hysteresis loss of the core snubber can be derived using the revised parallelogram method. The core snubber can be a simplified representation as an equivalent parallel resistance and inductance, which has been neglected by the FBO method. A simulation code including the parallel equivalent resistance and inductance has been set up. The simulation and experiments result in dramatically large arc shorting currents due to the parallel inductance effect. The case shows that the core snubber utilizing the FBO method gives more compact design.
Towards a Better Distributed Framework for Learning Big Data

DTIC Science & Technology

2017-06-14

UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi N.; Hixon, Duane

1992-01-01

The development of efficient iterative solution methods for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations is discussed. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. In this work, another approach based on the classical conjugate gradient method, known as the Generalized Minimum Residual (GMRES) algorithm is investigated. The GMRES algorithm has been used in the past by a number of researchers for solving steady viscous and inviscid flow problems. Here, we investigate the suitability of this algorithm for solving the system of non-linear equations that arise in unsteady Navier-Stokes solvers at each time step.
Incorporating prototyping and iteration into intervention development: a case study of a dining hall-based intervention.

PubMed

McClain, Arianna D; Hekler, Eric B; Gardner, Christopher D

2013-01-01

Previous research from the fields of computer science and engineering highlight the importance of an iterative design process (IDP) to create more creative and effective solutions. This study describes IDP as a new method for developing health behavior interventions and evaluates the effectiveness of a dining hall-based intervention developed using IDP on college students' eating behavior and values. participants were 458 students (52.6% female, age = 19.6 ± 1.5 years [M ± SD]). The intervention was developed via an IDP parallel process. A cluster-randomized controlled study compared differences in eating behavior among students in 4 university dining halls (2 intervention, 2 control). The final intervention was a multicomponent, point-of-selection marketing campaign. Students in the intervention dining halls consumed significantly less junk food and high-fat meat and increased their perceived importance of eating a healthful diet relative to the control group. IDP may be valuable for the development of behavior change interventions.
Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

NASA Technical Reports Server (NTRS)

Datta, Anubhav; Johnson, Wayne R.

2009-01-01

This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
On the Development of an Efficient Parallel Hybrid Solver with Application to Acoustically Treated Aero-Engine Nacelles

NASA Technical Reports Server (NTRS)

Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj

2006-01-01

A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
The application of contraction theory to an iterative formulation of electromagnetic scattering

NASA Technical Reports Server (NTRS)

Brand, J. C.; Kauffman, J. F.

1985-01-01

Contraction theory is applied to an iterative formulation of electromagnetic scattering from periodic structures and a computational method for insuring convergence is developed. A short history of spectral (or k-space) formulation is presented with an emphasis on application to periodic surfaces. To insure a convergent solution of the iterative equation, a process called the contraction corrector method is developed. Convergence properties of previously presented iterative solutions to one-dimensional problems are examined utilizing contraction theory and the general conditions for achieving a convergent solution are explored. The contraction corrector method is then applied to several scattering problems including an infinite grating of thin wires with the solution data compared to previous works.
Multiple Revolution Solutions for the Perturbed Lambert Problem using the Method of Particular Solutions and Picard Iteration

NASA Astrophysics Data System (ADS)

Woollands, Robyn M.; Read, Julie L.; Probe, Austin B.; Junkins, John L.

2017-12-01

We present a new method for solving the multiple revolution perturbed Lambert problem using the method of particular solutions and modified Chebyshev-Picard iteration. The method of particular solutions differs from the well-known Newton-shooting method in that integration of the state transition matrix (36 additional differential equations) is not required, and instead it makes use of a reference trajectory and a set of n particular solutions. Any numerical integrator can be used for solving two-point boundary problems with the method of particular solutions, however we show that using modified Chebyshev-Picard iteration affords an avenue for increased efficiency that is not available with other step-by-step integrators. We take advantage of the path approximation nature of modified Chebyshev-Picard iteration (nodes iteratively converge to fixed points in space) and utilize a variable fidelity force model for propagating the reference trajectory. Remarkably, we demonstrate that computing the particular solutions with only low fidelity function evaluations greatly increases the efficiency of the algorithm while maintaining machine precision accuracy. Our study reveals that solving the perturbed Lambert's problem using the method of particular solutions with modified Chebyshev-Picard iteration is about an order of magnitude faster compared with the classical shooting method and a tenth-twelfth order Runge-Kutta integrator. It is well known that the solution to Lambert's problem over multiple revolutions is not unique and to ensure that all possible solutions are considered we make use of a reliable preexisting Keplerian Lambert solver to warm start our perturbed algorithm.
Decomposition of timed automata for solving scheduling problems

NASA Astrophysics Data System (ADS)

Nishi, Tatsushi; Wakatake, Masato

2014-03-01

A decomposition algorithm for scheduling problems based on timed automata (TA) model is proposed. The problem is represented as an optimal state transition problem for TA. The model comprises of the parallel composition of submodels such as jobs and resources. The procedure of the proposed methodology can be divided into two steps. The first step is to decompose the TA model into several submodels by using decomposable condition. The second step is to combine individual solution of subproblems for the decomposed submodels by the penalty function method. A feasible solution for the entire model is derived through the iterated computation of solving the subproblem for each submodel. The proposed methodology is applied to solve flowshop and jobshop scheduling problems. Computational experiments demonstrate the effectiveness of the proposed algorithm compared with a conventional TA scheduling algorithm without decomposition.
Domain decomposition algorithms and computation fluid dynamics

NASA Technical Reports Server (NTRS)

Chan, Tony F.

1988-01-01

In the past several years, domain decomposition was a very popular topic, partly motivated by the potential of parallelization. While a large body of theory and algorithms were developed for model elliptic problems, they are only recently starting to be tested on realistic applications. The application of some of these methods to two model problems in computational fluid dynamics are investigated. Some examples are two dimensional convection-diffusion problems and the incompressible driven cavity flow problem. The construction and analysis of efficient preconditioners for the interface operator to be used in the iterative solution of the interface solution is described. For the convection-diffusion problems, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is discussed.
Acoustic 3D modeling by the method of integral equations

NASA Astrophysics Data System (ADS)

Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

2018-02-01

This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
An innovative hybrid 3D analytic-numerical model for air breathing parallel channel counter-flow PEM fuel cells.

PubMed

Tavčar, Gregor; Katrašnik, Tomaž

2014-01-01

The parallel straight channel PEM fuel cell model presented in this paper extends the innovative hybrid 3D analytic-numerical (HAN) approach previously published by the authors with capabilities to address ternary diffusion systems and counter-flow configurations. The model's core principle is modelling species transport by obtaining a 2D analytic solution for species concentration distribution in the plane perpendicular to the cannel gas-flow and coupling consecutive 2D solutions by means of a 1D numerical pipe-flow model. Electrochemical and other nonlinear phenomena are coupled to the species transport by a routine that uses derivative approximation with prediction-iteration. The latter is also the core of the counter-flow computation algorithm. A HAN model of a laboratory test fuel cell is presented and evaluated against a professional 3D CFD simulation tool showing very good agreement between results of the presented model and those of the CFD simulation. Furthermore, high accuracy results are achieved at moderate computational times, which is owed to the semi-analytic nature and to the efficient computational coupling of electrochemical kinetics and species transport.

Improved interpretation of satellite altimeter data using genetic algorithms

NASA Technical Reports Server (NTRS)

Messa, Kenneth; Lybanon, Matthew

1992-01-01

Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
A stopping criterion for the iterative solution of partial differential equations

NASA Astrophysics Data System (ADS)

Rao, Kaustubh; Malan, Paul; Perot, J. Blair

2018-01-01

A stopping criterion for iterative solution methods is presented that accurately estimates the solution error using low computational overhead. The proposed criterion uses information from prior solution changes to estimate the error. When the solution changes are noisy or stagnating it reverts to a less accurate but more robust, low-cost singular value estimate to approximate the error given the residual. This estimator can also be applied to iterative linear matrix solvers such as Krylov subspace or multigrid methods. Examples of the stopping criterion's ability to accurately estimate the non-linear and linear solution error are provided for a number of different test cases in incompressible fluid dynamics.
Equivalent charge source model based iterative maximum neighbor weight for sparse EEG source localization.

PubMed

Xu, Peng; Tian, Yin; Lei, Xu; Hu, Xiao; Yao, Dezhong

2008-12-01

How to localize the neural electric activities within brain effectively and precisely from the scalp electroencephalogram (EEG) recordings is a critical issue for current study in clinical neurology and cognitive neuroscience. In this paper, based on the charge source model and the iterative re-weighted strategy, proposed is a new maximum neighbor weight based iterative sparse source imaging method, termed as CMOSS (Charge source model based Maximum neighbOr weight Sparse Solution). Different from the weight used in focal underdetermined system solver (FOCUSS) where the weight for each point in the discrete solution space is independently updated in iterations, the new designed weight for each point in each iteration is determined by the source solution of the last iteration at both the point and its neighbors. Using such a new weight, the next iteration may have a bigger chance to rectify the local source location bias existed in the previous iteration solution. The simulation studies with comparison to FOCUSS and LORETA for various source configurations were conducted on a realistic 3-shell head model, and the results confirmed the validation of CMOSS for sparse EEG source localization. Finally, CMOSS was applied to localize sources elicited in a visual stimuli experiment, and the result was consistent with those source areas involved in visual processing reported in previous studies.
On the convergence of an iterative formulation of the electromagnetic scattering from an infinite grating of thin wires

NASA Technical Reports Server (NTRS)

Brand, J. C.

1985-01-01

Contraction theory is applied to an iterative formulation of electromagnetic scattering from periodic structures and a computational method for insuring convergence is developed. A short history of spectral (or k-space) formulation is presented with an emphasis on application to periodic surfaces. The mathematical background for formulating an iterative equation is covered using straightforward single variable examples including an extension to vector spaces. To insure a convergent solution of the iterative equation, a process called the contraction corrector method is developed. Convergence properties of previously presented iterative solutions to one-dimensional problems are examined utilizing contraction theory and the general conditions for achieving a convergent solution are explored. The contraction corrector method is then applied to several scattering problems including an infinite grating of thin wires with the solution data compared to previous works.
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram

2013-04-09

A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithmmore » is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.« less
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Aeroelastic optimization methodology for viscous and turbulent flows

NASA Astrophysics Data System (ADS)

Barcelos Junior, Manuel Nascimento Dias

2007-12-01

In recent years, the development of faster computers and parallel processing allowed the application of high-fidelity analysis methods to the aeroelastic design of aircraft. However, these methods are restricted to the final design verification, mainly due to the computational cost involved in iterative design processes. Therefore, this work is concerned with the creation of a robust and efficient aeroelastic optimization methodology for inviscid, viscous and turbulent flows by using high-fidelity analysis and sensitivity analysis techniques. Most of the research in aeroelastic optimization, for practical reasons, treat the aeroelastic system as a quasi-static inviscid problem. In this work, as a first step toward the creation of a more complete aeroelastic optimization methodology for realistic problems, an analytical sensitivity computation technique was developed and tested for quasi-static aeroelastic viscous and turbulent flow configurations. Viscous and turbulent effects are included by using an averaged discretization of the Navier-Stokes equations, coupled with an eddy viscosity turbulence model. For quasi-static aeroelastic problems, the traditional staggered solution strategy has unsatisfactory performance when applied to cases where there is a strong fluid-structure coupling. Consequently, this work also proposes a solution methodology for aeroelastic and sensitivity analyses of quasi-static problems, which is based on the fixed point of an iterative nonlinear block Gauss-Seidel scheme. The methodology can also be interpreted as the solution of the Schur complement of the aeroelastic and sensitivity analyses linearized systems of equations. The methodologies developed in this work are tested and verified by using realistic aeroelastic systems.
Domain decomposition method for the Baltic Sea based on theory of adjoint equation and inverse problem.

NASA Astrophysics Data System (ADS)

Lezina, Natalya; Agoshkov, Valery

2017-04-01

Domain decomposition method (DDM) allows one to present a domain with complex geometry as a set of essentially simpler subdomains. This method is particularly applied for the hydrodynamics of oceans and seas. In each subdomain the system of thermo-hydrodynamic equations in the Boussinesq and hydrostatic approximations is solved. The problem of obtaining solution in the whole domain is that it is necessary to combine solutions in subdomains. For this purposes iterative algorithm is created and numerical experiments are conducted to investigate an effectiveness of developed algorithm using DDM. For symmetric operators in DDM, Poincare-Steklov's operators [1] are used, but for the problems of the hydrodynamics, it is not suitable. In this case for the problem, adjoint equation method [2] and inverse problem theory are used. In addition, it is possible to create algorithms for the parallel calculations using DDM on multiprocessor computer system. DDM for the model of the Baltic Sea dynamics is numerically studied. The results of numerical experiments using DDM are compared with the solution of the system of hydrodynamic equations in the whole domain. The work was supported by the Russian Science Foundation (project 14-11-00609, the formulation of the iterative process and numerical experiments). [1] V.I. Agoshkov, Domain Decompositions Methods in the Mathematical Physics Problem // Numerical processes and systems, No 8, Moscow, 1991 (in Russian). [2] V.I. Agoshkov, Optimal Control Approaches and Adjoint Equations in the Mathematical Physics Problem, Institute of Numerical Mathematics, RAS, Moscow, 2003 (in Russian).
Efficient Solution of Three-Dimensional Problems of Acoustic and Electromagnetic Scattering by Open Surfaces

NASA Technical Reports Server (NTRS)

Turc, Catalin; Anand, Akash; Bruno, Oscar; Chaubell, Julian

2011-01-01

We present a computational methodology (a novel Nystrom approach based on use of a non-overlapping patch technique and Chebyshev discretizations) for efficient solution of problems of acoustic and electromagnetic scattering by open surfaces. Our integral equation formulations (1) Incorporate, as ansatz, the singular nature of open-surface integral-equation solutions, and (2) For the Electric Field Integral Equation (EFIE), use analytical regularizes that effectively reduce the number of iterations required by iterative linear-algebra solution based on Krylov-subspace iterative solvers.
Parallelization of implicit finite difference schemes in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

1990-01-01

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models

PubMed Central

Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.

2016-01-01

Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
Sensitivity calculations for iteratively solved problems

NASA Technical Reports Server (NTRS)

Haftka, R. T.

1985-01-01

The calculation of sensitivity derivatives of solutions of iteratively solved systems of algebraic equations is investigated. A modified finite difference procedure is presented which improves the accuracy of the calculated derivatives. The procedure is demonstrated for a simple algebraic example as well as an element-by-element preconditioned conjugate gradient iterative solution technique applied to truss examples.
Parallel Implementation of 3-D Iterative Reconstruction With Intra-Thread Update for the jPET-D4

NASA Astrophysics Data System (ADS)

Lam, Chih Fung; Yamaya, Taiga; Obi, Takashi; Yoshida, Eiji; Inadama, Naoko; Shibuya, Kengo; Nishikido, Fumihiko; Murayama, Hideo

2009-02-01

One way to speed-up iterative image reconstruction is by parallel computing with a computer cluster. However, as the number of computing threads increases, parallel efficiency decreases due to network transfer delay. In this paper, we proposed a method to reduce data transfer between computing threads by introducing an intra-thread update. The update factor is collected from each slave thread and a global image is updated as usual in the first K sub-iteration. In the rest of the sub-iterations, the global image is only updated at an interval which is controlled by a parameter L. In between that interval, the intra-thread update is carried out whereby an image update is performed in each slave thread locally. We investigated combinations of K and L parameters based on parallel implementation of RAMLA for the jPET-D4 scanner. Our evaluation used four workstations with a total of 16 slave threads. Each slave thread calculated a different set of LORs which are divided according to ring difference numbers. We assessed image quality of the proposed method with a hotspot simulation phantom. The figure of merit was the full-width-half-maximum of hotspots and the background normalized standard deviation. At an optimum K and L setting, we did not find significant change in the output images. We also applied the proposed method to a Hoffman phantom experiment and found the difference due to intra-thread update was negligible. With the intra-thread update, computation time could be reduced by about 23%.
DL_MG: A Parallel Multigrid Poisson and Poisson-Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution.

PubMed

Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton

2018-03-13

The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
Separation of Undersampled Composite Signals Using the Dantzig Selector with Overcomplete Dictionaries

DTIC Science & Technology

2014-06-02

2011). [22] Li, Q., Micchelli, C., Shen, L., and Xu, Y. A proximity algorithm acelerated by Gauss - Seidel iterations for L1/TV denoising models. Inverse...system of equations and their relationship to the solution of Model (2) and present an algorithm with an iterative approach for finding these solutions...Using the fixed-point characterization above, the (k + 1)th iteration of the prox- imity operator algorithm to find the solution of the Dantzig
DGDFT: A massively parallel method for large scale density functional theory calculations.

PubMed

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

NASA Astrophysics Data System (ADS)

Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

2017-11-01

The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Barrier-breaking performance for industrial problems on the CRAY C916

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graffunder, S.K.

1993-12-31

Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

DTIC Science & Technology

2011-12-16

other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...Lab affiliates National Instruments, NEC, Nokia , NVIDIA, and Samsung. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism...concurrently update x, some of these CAS’s will fail and those parallel loop iterations will recompute their updates to x and try again. Consider the parallel

Learning control system design based on 2-D theory - An application to parallel link manipulator

NASA Technical Reports Server (NTRS)

Geng, Z.; Carroll, R. L.; Lee, J. D.; Haynes, L. H.

1990-01-01

An approach to iterative learning control system design based on two-dimensional system theory is presented. A two-dimensional model for the iterative learning control system which reveals the connections between learning control systems and two-dimensional system theory is established. A learning control algorithm is proposed, and the convergence of learning using this algorithm is guaranteed by two-dimensional stability. The learning algorithm is applied successfully to the trajectory tracking control problem for a parallel link robot manipulator. The excellent performance of this learning algorithm is demonstrated by the computer simulation results.
D4Z - a new renumbering for iterative solution of ground-water flow and solute- transport equations

USGS Publications Warehouse

Kipp, K.L.; Russell, T.F.; Otto, J.S.

1992-01-01

D4 zig-zag (D4Z) is a new renumbering scheme for producing a reduced matrix to be solved by an incomplete LU preconditioned, restarted conjugate-gradient iterative solver. By renumbering alternate diagonals in a zig-zag fashion, a very low sensitivity of convergence rate to renumbering direction is obtained. For two demonstration problems involving groundwater flow and solute transport, iteration counts are related to condition numbers and spectra of the reduced matrices.
Soft-output decoding algorithms in iterative decoding of turbo codes

NASA Technical Reports Server (NTRS)

Benedetto, S.; Montorsi, G.; Divsalar, D.; Pollara, F.

1996-01-01

In this article, we present two versions of a simplified maximum a posteriori decoding algorithm. The algorithms work in a sliding window form, like the Viterbi algorithm, and can thus be used to decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring code trellis termination. A heuristic explanation is also given of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes (turbo codes). The performances of the two algorithms are compared on the basis of a powerful rate 1/3 parallel concatenated code. Basic circuits to implement the simplified a posteriori decoding algorithm using lookup tables, and two further approximations (linear and threshold), with a very small penalty, to eliminate the need for lookup tables are proposed.
Kinematics and dynamics of a six-degree-of-freedom robot manipulator with closed kinematic chain mechanism

NASA Technical Reports Server (NTRS)

Nguyen, Charles C.; Pooran, Farhad J.

1989-01-01

This paper deals with a class of robot manipulators built based on the kinematic chain mechanism (CKCM). This class of CKCM manipulators consists of a fixed and a moving platform coupled together via a number of in-parallel actuators. A closed-form solution is derived for the inverse kinematic problem of a six-degre-of-freedom CKCM manipulator designed to study robotic applications in space. Iterative Newton-Raphson method is employed to solve the forward kinematic problem. Dynamics of the above manipulator is derived using the Lagrangian approach. Computer simulation of the dynamical equations shows that the actuating forces are strongly dependent on the mass and centroid of the robot links.
Coupled CFD-Thermal Analysis of Erosion Patterns Resulting from Nozzle Wedgeouts on the SRTMV-N2

NASA Technical Reports Server (NTRS)

Ables, Catherine; Davis, Philip

2014-01-01

The objective of this analysis was to study the effects of the erosion patterns from the introduction of nozzle flaws machined into the nozzle of the SRTMV-N2 (Solid Rocket Test Motor V Nozzle 2). The SRTMV-N2 motor was a single segment static subscale solid rocket motor used to further develop the RSRMV (Redesigned Solid Rocket Motor V Segment). Two flaws or "wedgeouts" were placed in the nozzle inlet parallel to the ply angles of that section to study erosion effects. One wedgeout was placed in the nose cap region and the other placed in the inlet ring on the opposite side of the bondline, separated 180 degrees circumferentially. A coupled CFD (Computational Fluid Analysis)-thermal iterative analytical approach was utilized at the wedgeouts to analyze the erosion profile during the burn time. The iterative CFD thermal approach was applied at five second intervals throughout the motor burn. The coupled fluid thermal boundary conditions were derived from a steady state CFD solution at the beginning of the interval. The derived heat fluxes were then applied along the surface and a transient thermal solution was developed to characterize the material response over the specified interval. Eroded profiles of each of the nozzle's wedgeouts and the original contour were created at each of the specified intervals. The final iteration of the erosion profile showed that both wedgeouts were "washedout," indicating that the erosion profile of the wedgeout had rejoined the original eroded contour, leaving no trace of the wedgeouts post fire. This analytical assessment agreed with post-fire observations made of the SRTMV-N2 wedgeouts, which noted a smooth eroded contour.
Multiple solution of linear algebraic systems by an iterative method with recomputed preconditioner in the analysis of microstrip structures

NASA Astrophysics Data System (ADS)

Ahunov, Roman R.; Kuksenko, Sergey P.; Gazizov, Talgat R.

2016-06-01

A multiple solution of linear algebraic systems with dense matrix by iterative methods is considered. To accelerate the process, the recomputing of the preconditioning matrix is used. A priory condition of the recomputing based on change of the arithmetic mean of the current solution time during the multiple solution is proposed. To confirm the effectiveness of the proposed approach, the numerical experiments using iterative methods BiCGStab and CGS for four different sets of matrices on two examples of microstrip structures are carried out. For solution of 100 linear systems the acceleration up to 1.6 times, compared to the approach without recomputing, is obtained.
Krylov subspace iterative methods for boundary element method based near-field acoustic holography.

PubMed

Valdivia, Nicolas; Williams, Earl G

2005-02-01

The reconstruction of the acoustic field for general surfaces is obtained from the solution of a matrix system that results from a boundary integral equation discretized using boundary element methods. The solution to the resultant matrix system is obtained using iterative regularization methods that counteract the effect of noise on the measurements. These methods will not require the calculation of the singular value decomposition, which can be expensive when the matrix system is considerably large. Krylov subspace methods are iterative methods that have the phenomena known as "semi-convergence," i.e., the optimal regularization solution is obtained after a few iterations. If the iteration is not stopped, the method converges to a solution that generally is totally corrupted by errors on the measurements. For these methods the number of iterations play the role of the regularization parameter. We will focus our attention to the study of the regularizing properties from the Krylov subspace methods like conjugate gradients, least squares QR and the recently proposed Hybrid method. A discussion and comparison of the available stopping rules will be included. A vibrating plate is considered as an example to validate our results.
Global Patch Matching

NASA Astrophysics Data System (ADS)

Huang, X.; Hu, K.; Ling, X.; Zhang, Y.; Lu, Z.; Zhou, G.

2017-09-01

This paper introduces a novel global patch matching method that focuses on how to remove fronto-parallel bias and obtain continuous smooth surfaces with assuming that the scenes covered by stereos are piecewise continuous. Firstly, simple linear iterative cluster method (SLIC) is used to segment the base image into a series of patches. Then, a global energy function, which consists of a data term and a smoothness term, is built on the patches. The data term is the second-order Taylor expansion of correlation coefficients, and the smoothness term is built by combing connectivity constraints and the coplanarity constraints are combined to construct the smoothness term. Finally, the global energy function can be built by combining the data term and the smoothness term. We rewrite the global energy function in a quadratic matrix function, and use least square methods to obtain the optimal solution. Experiments on Adirondack stereo and Motorcycle stereo of Middlebury benchmark show that the proposed method can remove fronto-parallel bias effectively, and produce continuous smooth surfaces.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm

NASA Astrophysics Data System (ADS)

Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

2018-04-01

Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
A study on Marangoni convection by the variational iteration method

NASA Astrophysics Data System (ADS)

Karaoǧlu, Onur; Oturanç, Galip

2012-09-01

In this paper, we will consider the use of the variational iteration method and Padé approximant for finding approximate solutions for a Marangoni convection induced flow over a free surface due to an imposed temperature gradient. The solutions are compared with the numerical (fourth-order Runge Kutta) solutions.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

NASA Astrophysics Data System (ADS)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

2018-01-01

We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
An interative solution of an integral equation for radiative transfer by using variational technique

NASA Technical Reports Server (NTRS)

Yoshikawa, K. K.

1973-01-01

An effective iterative technique is introduced to solve a nonlinear integral equation frequently associated with radiative transfer problems. The problem is formulated in such a way that each step of an iterative sequence requires the solution of a linear integral equation. The advantage of a previously introduced variational technique which utilizes a stepwise constant trial function is exploited to cope with the nonlinear problem. The method is simple and straightforward. Rapid convergence is obtained by employing a linear interpolation of the iterative solutions. Using absorption coefficients of the Milne-Eddington type, which are applicable to some planetary atmospheric radiation problems. Solutions are found in terms of temperature and radiative flux. These solutions are presented numerically and show excellent agreement with other numerical solutions.
Gravity-Assist Trajectories to the Ice Giants: An Automated Method to Catalog Mass- Or Time-Optimal Solutions

NASA Technical Reports Server (NTRS)

Hughes, Kyle M.; Knittel, Jeremy M.; Englander, Jacob A.

2017-01-01

This work presents an automated method of calculating mass (or time) optimal gravity-assist trajectories without a priori knowledge of the flyby-body combination. Since gravity assists are particularly crucial for reaching the outer Solar System, we use the Ice Giants, Uranus and Neptune, as example destinations for this work. Catalogs are also provided that list the most attractive trajectories found over launch dates ranging from 2024 to 2038. The tool developed to implement this method, called the Python EMTG Automated Trade Study Application (PEATSA), iteratively runs the Evolutionary Mission Trajectory Generator (EMTG), a NASA Goddard Space Flight Center in-house trajectory optimization tool. EMTG finds gravity-assist trajectories with impulsive maneuvers using a multiple-shooting structure along with stochastic methods (such as monotonic basin hopping) and may be run with or without an initial guess provided. PEATSA runs instances of EMTG in parallel over a grid of launch dates. After each set of runs completes, the best results within a neighborhood of launch dates are used to seed all other cases in that neighborhood-allowing the solutions across the range of launch dates to improve over each iteration. The results here are compared against trajectories found using a grid-search technique, and PEATSA is found to outperform the grid-search results for most launch years considered.
Gravity-Assist Trajectories to the Ice Giants: An Automated Method to Catalog Mass-or Time-Optimal Solutions

NASA Technical Reports Server (NTRS)

Hughes, Kyle M.; Knittel, Jeremy M.; Englander, Jacob A.

2017-01-01

This work presents an automated method of calculating mass (or time) optimal gravity-assist trajectories without a priori knowledge of the flyby-body combination. Since gravity assists are particularly crucial for reaching the outer Solar System, we use the Ice Giants, Uranus and Neptune, as example destinations for this work. Catalogs are also provided that list the most attractive trajectories found over launch dates ranging from 2024 to 2038. The tool developed to implement this method, called the Python EMTG Automated Trade Study Application (PEATSA), iteratively runs the Evolutionary Mission Trajectory Generator (EMTG), a NASA Goddard Space Flight Center in-house trajectory optimization tool. EMTG finds gravity-assist trajectories with impulsive maneuvers using a multiple-shooting structure along with stochastic methods (such as monotonic basin hopping) and may be run with or without an initial guess provided. PEATSA runs instances of EMTG in parallel over a grid of launch dates. After each set of runs completes, the best results within a neighborhood of launch dates are used to seed all other cases in that neighborhood---allowing the solutions across the range of launch dates to improve over each iteration. The results here are compared against trajectories found using a grid-search technique, and PEATSA is found to outperform the grid-search results for most launch years considered.
Successive Over-Relaxation Technique for High-Performance Blind Image Deconvolution

DTIC Science & Technology

2015-06-08

deconvolution, space surveillance, Gauss - Seidel iteration 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT SAR 18, NUMBER OF PAGES 5...sensible approximate solutions to the ill-posed nonlinear inverse problem. These solutions are addresses as fixed points of the iteration which consists in...alternating approximations (AA) for the object and for the PSF performed with a prescribed number of inner iterative descents from trivial (zero
Hybrid parallel code acceleration methods in full-core reactor physics calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Courau, T.; Plagne, L.; Ponicot, A.

2012-07-01

When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadraturemore » required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)« less
Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

NASA Astrophysics Data System (ADS)

Marx, Alain; Lütjens, Hinrich

2017-03-01

A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
R2D2-A Fortran Program for Two-Dimensional Chemically Reacting, Hyperthermal, Internal Flows, Volume II.

DTIC Science & Technology

1980-01-01

is identified in the flow chart simply as "Compute VECT’s ( predictor solution)" and "Compute V’s ( corrector solution)." A significant portion of the...TrintoTo Tm ANDera ionT SToION 28 ITIME :1 PRINCIPAL SUBROUTINES WALLPOINT (ITER,DT) ITER - iteration index for MacCormack Algorithm (ITER=1 for predictor ...WEILERSTEIN, R RAY, 6 MILLER F33615-7- C -3016UNLASSIFIED GASL-TR-254-VBL-2 AFFDL-TR-79-3162-VOL-2 NII III hImllllllllll EIEIIIIIIEIIEE EEIIIIIIIIIIII H
Perturbation-iteration theory for analyzing microwave striplines

NASA Technical Reports Server (NTRS)

Kretch, B. E.

1985-01-01

A perturbation-iteration technique is presented for determining the propagation constant and characteristic impedance of an unshielded microstrip transmission line. The method converges to the correct solution with a few iterations at each frequency and is equivalent to a full wave analysis. The perturbation-iteration method gives a direct solution for the propagation constant without having to find the roots of a transcendental dispersion equation. The theory is presented in detail along with numerical results for the effective dielectric constant and characteristic impedance for a wide range of substrate dielectric constants, stripline dimensions, and frequencies.

Parallel fast multipole boundary element method applied to computational homogenization

NASA Astrophysics Data System (ADS)

Ptaszny, Jacek

2018-01-01

In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
Parallel iterative methods for sparse linear and nonlinear equations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
The Battlefield Environment Division Modeling Framework (BMF). Part 1: Optimizing the Atmospheric Boundary Layer Environment Model for Cluster Computing

DTIC Science & Technology

2014-02-01

idle waiting for the wavefront to reach it. To overcome this, Reeve et al. (2001) 3 developed a scheme in analogy to the red-black Gauss - Seidel iterative ...understandable procedure calls. Parallelization of the SIMPLE iterative scheme with SIP used a red-black scheme similar to the red-black Gauss - Seidel ...scheme, the SIMPLE method, for pressure-velocity coupling. The result is a slowing convergence of the outer iterations . The red-black scheme excites a 2
Iteration with Spreadsheets.

ERIC Educational Resources Information Center

Smith, Michael

1990-01-01

Presents several examples of the iteration method using computer spreadsheets. Examples included are simple iterative sequences and the solution of equations using the Newton-Raphson formula, linear interpolation, and interval bisection. (YP)
Modern Workflow Full Waveform Inversion Applied to North America and the Northern Atlantic

NASA Astrophysics Data System (ADS)

Krischer, Lion; Fichtner, Andreas; Igel, Heiner

2015-04-01

We present the current state of a new seismic tomography model obtained using full waveform inversion of the crustal and upper mantle structure beneath North America and the Northern Atlantic, including the westernmost part of Europe. Parts of the eastern portion of the initial model consists of previous models by Fichtner et al. (2013) and Rickers et al. (2013). The final results of this study will contribute to the 'Comprehensive Earth Model' being developed by the Computational Seismology group at ETH Zurich. Significant challenges include the size of the domain, the uneven event and station coverage, and the strong east-west alignment of seismic ray paths across the North Atlantic. We use as much data as feasible, resulting in several thousand recordings per event depending on the receivers deployed at the earthquakes' origin times. To manage such projects in a reproducible and collaborative manner, we, as tomographers, should abandon ad-hoc scripts and one-time programs, and adopt sustainable and reusable solutions. Therefore we developed the LArge-scale Seismic Inversion Framework (LASIF - http://lasif.net), an open-source toolbox for managing seismic data in the context of non-linear iterative inversions that greatly reduces the time to research. Information on the applied processing, modelling, iterative model updating, what happened during each iteration, and so on are systematically archived. This results in a provenance record of the final model which in the end significantly enhances the reproducibility of iterative inversions. Additionally, tools for automated data download across different data centers, window selection, misfit measurements, parallel data processing, and input file generation for various forward solvers are provided.
Optical CT imaging of solid radiochromic dosimeters in mismatched refractive index solutions using a scanning laser and large area detector.

PubMed

Dekker, Kurtis H; Battista, Jerry J; Jordan, Kevin J

2016-08-01

The practical use of the PRESAGE® solid plastic dosimeter is limited by the inconvenience of immersing it in high-viscosity oils to achieve refractive index matching for optical computed tomography (CT) scanning. The oils are slow to mix and difficult to clean from surfaces, and the dosimeter rotation can generate dynamic Schlieren inhomogeneity patterns in the reference liquid, limiting the rotational and overall scan speed. Therefore, it would be beneficial if lower-viscosity, water-based solutions with slightly unmatched refractive index could be used instead. The purpose of this work is to demonstrate the feasibility of allowing mismatched conditions when using a scanning laser system with a large acceptance angle detector. A fiducial-based ray path measurement technique is combined with an iterative CT reconstruction algorithm to reconstruct images. A water based surrounding liquid with a low viscosity was selected for imaging PRESAGE® solid dosimeters. Liquid selection was optimized to achieve as high a refractive index as possible while avoiding rotation-induced Schlieren effects. This led to a refractive index mismatch of 6% between liquid and dosimeters. Optical CT scans were performed with a fan-beam scanning-laser optical CT system with a large area detector to capture most of the refracted rays. A fiducial marker placed on the wall of a cylindrical sample occludes a given light ray twice. With knowledge of the rotation angle and the radius of the cylindrical object, the actual internal path of each ray through the dosimeter can be calculated. Scans were performed with 1024 projections of 512 data samples each, and rays were rebinned to form 512 parallel-beam projections. Reconstructions were performed on a 512 × 512 grid using 100 iterations of the SIRT iterative CT algorithm. Proof of concept was demonstrated with a uniformly attenuating solution phantom. PRESAGE® dosimeters (11 cm diameter) were irradiated with Cobalt-60 irradiator to achieve either a uniform dose or a 2-level "step-dose" pattern. With 6% refractive index mismatching, a circular field of view of 85% of the diameter of a cylindrical sample can be reconstructed accurately. Reconstructed images of the test solution phantom were uniform (within 3%) inside this radius. However, the dose responses of the PRESAGE® samples were not spatially uniform, with variations of at least 5% in sensitivity. The variation appears as a "cupping" artifact with less sensitivity in the middle than at the periphery of the PRESAGE® cylinder. Polarization effects were also detected for these samples. The fiducial-based ray path measurement scheme, coupled with an iterative reconstruction algorithm, enabled optical CT scanning of PRESAGE® dosimeters immersed in mismatched refractive index solutions. However, improvements to PRESAGE® dose response uniformity are required.
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science.

PubMed

Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H

2014-05-28

Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
Iterative solution of the inverse Cauchy problem for an elliptic equation by the conjugate gradient method

NASA Astrophysics Data System (ADS)

Vasil'ev, V. I.; Kardashevsky, A. M.; Popov, V. V.; Prokopev, G. A.

2017-10-01

This article presents results of computational experiment carried out using a finite-difference method for solving the inverse Cauchy problem for a two-dimensional elliptic equation. The computational algorithm involves an iterative determination of the missing boundary condition from the override condition using the conjugate gradient method. The results of calculations are carried out on the examples with exact solutions as well as at specifying an additional condition with random errors are presented. Results showed a high efficiency of the iterative method of conjugate gradients for numerical solution
An iterative solver for the 3D Helmholtz equation

NASA Astrophysics Data System (ADS)

Belonosov, Mikhail; Dmitriev, Maxim; Kostin, Victor; Neklyudov, Dmitry; Tcheverda, Vladimir

2017-09-01

We develop a frequency-domain iterative solver for numerical simulation of acoustic waves in 3D heterogeneous media. It is based on the application of a unique preconditioner to the Helmholtz equation that ensures convergence for Krylov subspace iteration methods. Effective inversion of the preconditioner involves the Fast Fourier Transform (FFT) and numerical solution of a series of boundary value problems for ordinary differential equations. Matrix-by-vector multiplication for iterative inversion of the preconditioned matrix involves inversion of the preconditioner and pointwise multiplication of grid functions. Our solver has been verified by benchmarking against exact solutions and a time-domain solver.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Computational Issues in Damping Identification for Large Scale Problems

NASA Technical Reports Server (NTRS)

Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.

1997-01-01

Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
On the solution of evolution equations based on multigrid and explicit iterative methods

NASA Astrophysics Data System (ADS)

Zhukov, V. T.; Novikova, N. D.; Feodoritova, O. B.

2015-08-01

Two schemes for solving initial-boundary value problems for three-dimensional parabolic equations are studied. One is implicit and is solved using the multigrid method, while the other is explicit iterative and is based on optimal properties of the Chebyshev polynomials. In the explicit iterative scheme, the number of iteration steps and the iteration parameters are chosen as based on the approximation and stability conditions, rather than on the optimization of iteration convergence to the solution of the implicit scheme. The features of the multigrid scheme include the implementation of the intergrid transfer operators for the case of discontinuous coefficients in the equation and the adaptation of the smoothing procedure to the spectrum of the difference operators. The results produced by these schemes as applied to model problems with anisotropic discontinuous coefficients are compared.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Parallel heuristics for scalable community detection

DOE PAGES

Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

2015-08-14

Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

PubMed

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2014-02-28

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

PubMed Central

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2015-01-01

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm

PubMed Central

Hesford, Andrew J.; Chew, Weng C.

2010-01-01

The distorted Born iterative method (DBIM) computes iterative solutions to nonlinear inverse scattering problems through successive linear approximations. By decomposing the scattered field into a superposition of scattering by an inhomogeneous background and by a material perturbation, large or high-contrast variations in medium properties can be imaged through iterations that are each subject to the distorted Born approximation. However, the need to repeatedly compute forward solutions still imposes a very heavy computational burden. To ameliorate this problem, the multilevel fast multipole algorithm (MLFMA) has been applied as a forward solver within the DBIM. The MLFMA computes forward solutions in linear time for volumetric scatterers. The typically regular distribution and shape of scattering elements in the inverse scattering problem allow the method to take advantage of data redundancy and reduce the computational demands of the normally expensive MLFMA setup. Additional benefits are gained by employing Kaczmarz-like iterations, where partial measurements are used to accelerate convergence. Numerical results demonstrate both the efficiency of the forward solver and the successful application of the inverse method to imaging problems with dimensions in the neighborhood of ten wavelengths. PMID:20707438
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE PAGES

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...

2017-09-14

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions

NASA Astrophysics Data System (ADS)

Furuichi, Mikito; Nishiura, Daisuke

2017-10-01

We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.

Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Performance Analysis of Iterative Channel Estimation and Multiuser Detection in Multipath DS-CDMA Channels

NASA Astrophysics Data System (ADS)

Li, Husheng; Betz, Sharon M.; Poor, H. Vincent

2007-05-01

This paper examines the performance of decision feedback based iterative channel estimation and multiuser detection in channel coded aperiodic DS-CDMA systems operating over multipath fading channels. First, explicit expressions describing the performance of channel estimation and parallel interference cancellation based multiuser detection are developed. These results are then combined to characterize the evolution of the performance of a system that iterates among channel estimation, multiuser detection and channel decoding. Sufficient conditions for convergence of this system to a unique fixed point are developed.
Numerical solutions of nonlinear STIFF initial value problems by perturbed functional iterations

NASA Technical Reports Server (NTRS)

Dey, S. K.

1982-01-01

Numerical solution of nonlinear stiff initial value problems by a perturbed functional iterative scheme is discussed. The algorithm does not fully linearize the system and requires only the diagonal terms of the Jacobian. Some examples related to chemical kinetics are presented.
Accelerating the weighted histogram analysis method by direct inversion in the iterative subspace.

PubMed

Zhang, Cheng; Lai, Chun-Liang; Pettitt, B Montgomery

The weighted histogram analysis method (WHAM) for free energy calculations is a valuable tool to produce free energy differences with the minimal errors. Given multiple simulations, WHAM obtains from the distribution overlaps the optimal statistical estimator of the density of states, from which the free energy differences can be computed. The WHAM equations are often solved by an iterative procedure. In this work, we use a well-known linear algebra algorithm which allows for more rapid convergence to the solution. We find that the computational complexity of the iterative solution to WHAM and the closely-related multiple Bennett acceptance ratio (MBAR) method can be improved by using the method of direct inversion in the iterative subspace. We give examples from a lattice model, a simple liquid and an aqueous protein solution.
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

2005-01-01

A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
An accurate, fast, and scalable solver for high-frequency wave propagation

NASA Astrophysics Data System (ADS)

Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.

2017-12-01

In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Efficient solution of the simplified P N equations

DOE PAGES

Hamilton, Steven P.; Evans, Thomas M.

2014-12-23

We show new solver strategies for the multigroup SPN equations for nuclear reactor analysis. By forming the complete matrix over space, moments, and energy a robust set of solution strategies may be applied. Moreover, power iteration, shifted power iteration, Rayleigh quotient iteration, Arnoldi's method, and a generalized Davidson method, each using algebraic and physics-based multigrid preconditioners, have been compared on C5G7 MOX test problem as well as an operational PWR model. These results show that the most ecient approach is the generalized Davidson method, that is 30-40 times faster than traditional power iteration and 6-10 times faster than Arnoldi's method.
Performance and capacity analysis of Poisson photon-counting based Iter-PIC OCDMA systems.

PubMed

Li, Lingbin; Zhou, Xiaolin; Zhang, Rong; Zhang, Dingchen; Hanzo, Lajos

2013-11-04

In this paper, an iterative parallel interference cancellation (Iter-PIC) technique is developed for optical code-division multiple-access (OCDMA) systems relying on shot-noise limited Poisson photon-counting reception. The novel semi-analytical tool of extrinsic information transfer (EXIT) charts is used for analysing both the bit error rate (BER) performance as well as the channel capacity of these systems and the results are verified by Monte Carlo simulations. The proposed Iter-PIC OCDMA system is capable of achieving two orders of magnitude BER improvements and a 0.1 nats of capacity improvement over the conventional chip-level OCDMA systems at a coding rate of 1/10.
Locally expansive solutions for a class of iterative equations.

PubMed

Song, Wei; Chen, Sheng

2013-01-01

Iterative equations which can be expressed by the following form f (n) (x) = H(x, f(x), f (2)(x),…, f (n-1)(x)), where n ≥ 2, are investigated. Conditions for the existence of locally expansive C (1) solutions for such equations are given.
Iterative Methods for the Non-LTE Transfer of Polarized Radiation: Resonance Line Polarization in One-dimensional Atmospheres

NASA Astrophysics Data System (ADS)

Trujillo Bueno, Javier; Manso Sainz, Rafael

1999-05-01

This paper shows how to generalize to non-LTE polarization transfer some operator splitting methods that were originally developed for solving unpolarized transfer problems. These are the Jacobi-based accelerated Λ-iteration (ALI) method of Olson, Auer, & Buchler and the iterative schemes based on Gauss-Seidel and successive overrelaxation (SOR) iteration of Trujillo Bueno and Fabiani Bendicho. The theoretical framework chosen for the formulation of polarization transfer problems is the quantum electrodynamics (QED) theory of Landi Degl'Innocenti, which specifies the excitation state of the atoms in terms of the irreducible tensor components of the atomic density matrix. This first paper establishes the grounds of our numerical approach to non-LTE polarization transfer by concentrating on the standard case of scattering line polarization in a gas of two-level atoms, including the Hanle effect due to a weak microturbulent and isotropic magnetic field. We begin demonstrating that the well-known Λ-iteration method leads to the self-consistent solution of this type of problem if one initializes using the ``exact'' solution corresponding to the unpolarized case. We show then how the above-mentioned splitting methods can be easily derived from this simple Λ-iteration scheme. We show that our SOR method is 10 times faster than the Jacobi-based ALI method, while our implementation of the Gauss-Seidel method is 4 times faster. These iterative schemes lead to the self-consistent solution independently of the chosen initialization. The convergence rate of these iterative methods is very high; they do not require either the construction or the inversion of any matrix, and the computing time per iteration is similar to that of the Λ-iteration method.
Novel aspects of plasma control in ITER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humphreys, D.; Jackson, G.; Walker, M.

2015-02-15

ITER plasma control design solutions and performance requirements are strongly driven by its nuclear mission, aggressive commissioning constraints, and limited number of operational discharges. In addition, high plasma energy content, heat fluxes, neutron fluxes, and very long pulse operation place novel demands on control performance in many areas ranging from plasma boundary and divertor regulation to plasma kinetics and stability control. Both commissioning and experimental operations schedules provide limited time for tuning of control algorithms relative to operating devices. Although many aspects of the control solutions required by ITER have been well-demonstrated in present devices and even designed satisfactorily formore » ITER application, many elements unique to ITER including various crucial integration issues are presently under development. We describe selected novel aspects of plasma control in ITER, identifying unique parts of the control problem and highlighting some key areas of research remaining. Novel control areas described include control physics understanding (e.g., current profile regulation, tearing mode (TM) suppression), control mathematics (e.g., algorithmic and simulation approaches to high confidence robust performance), and integration solutions (e.g., methods for management of highly subscribed control resources). We identify unique aspects of the ITER TM suppression scheme, which will pulse gyrotrons to drive current within a magnetic island, and turn the drive off following suppression in order to minimize use of auxiliary power and maximize fusion gain. The potential role of active current profile control and approaches to design in ITER are discussed. Issues and approaches to fault handling algorithms are described, along with novel aspects of actuator sharing in ITER.« less
Novel aspects of plasma control in ITER

DOE PAGES

Humphreys, David; Ambrosino, G.; de Vries, Peter; ...

2015-02-12

ITER plasma control design solutions and performance requirements are strongly driven by its nuclear mission, aggressive commissioning constraints, and limited number of operational discharges. In addition, high plasma energy content, heat fluxes, neutron fluxes, and very long pulse operation place novel demands on control performance in many areas ranging from plasma boundary and divertor regulation to plasma kinetics and stability control. Both commissioning and experimental operations schedules provide limited time for tuning of control algorithms relative to operating devices. Although many aspects of the control solutions required by ITER have been well-demonstrated in present devices and even designed satisfactorily formore » ITER application, many elements unique to ITER including various crucial integration issues are presently under development. We describe selected novel aspects of plasma control in ITER, identifying unique parts of the control problem and highlighting some key areas of research remaining. Novel control areas described include control physics understanding (e.g. current profile regulation, tearing mode suppression (TM)), control mathematics (e.g. algorithmic and simulation approaches to high confidence robust performance), and integration solutions (e.g. methods for management of highly-subscribed control resources). We identify unique aspects of the ITER TM suppression scheme, which will pulse gyrotrons to drive current within a magnetic island, and turn the drive off following suppression in order to minimize use of auxiliary power and maximize fusion gain. The potential role of active current profile control and approaches to design in ITER are discussed. Finally, issues and approaches to fault handling algorithms are described, along with novel aspects of actuator sharing in ITER.« less
Massively parallel first-principles simulation of electron dynamics in materials

DOE PAGES

Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.; ...

2017-08-01

Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Massively parallel first-principles simulation of electron dynamics in materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.

Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Training tomorrow's doctors to explain 'medically unexplained' physical symptoms: An examination of UK medical educators' views of barriers and solutions.

PubMed

Joyce, Emmeline; Cowing, Jennifer; Lazarus, Candice; Smith, Charlotte; Zenzuck, Victoria; Peters, Sarah

2018-05-01

Co-occuring physical symptoms, unexplained by organic pathology (known as Functional Syndromes, FS), are common and disabling presentations. However, FS is absent or inconsistently taught within undergraduate medical training. This study investigates the reasons for this and identifies potential solutions to improved implementation. Twenty-eight medical educators from thirteen different UK medical schools participated in semi-structured interviews. Thematic analysis proceeded iteratively, and in parallel with data production. Barriers to implementing FS training are beliefs about the complexity of FS, tutors' negative attitudes towards FS, and FS being perceived as a low priority for the curriculum. In parallel participants recognised FS as ubiquitous within medical practice and erroneously assumed it must be taught by someone. They recommended that students should learn about FS through managed exposure, but only if tutors' negative attitudes and behaviour are also addressed. Negative attitudes towards FS by educators prevents designing and delivering effective education on this common medical presentation. Whilst there is recognition of the need to implement FS training, recommendations are multifaceted. Increased liaison between students, patients and educators is necessary to develop more informed and effective teaching methods for trainee doctors about FS and in order to minimise the impact of the hidden curriculum. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Numerical modeling of the radiative transfer in a turbid medium using the synthetic iteration.

PubMed

Budak, Vladimir P; Kaloshin, Gennady A; Shagalov, Oleg V; Zheltov, Victor S

2015-07-27

In this paper we propose the fast, but the accurate algorithm for numerical modeling of light fields in the turbid media slab. For the numerical solution of the radiative transfer equation (RTE) it is required its discretization based on the elimination of the solution anisotropic part and the replacement of the scattering integral by a finite sum. The solution regular part is determined numerically. A good choice of the method of the solution anisotropic part elimination determines the high convergence of the algorithm in the mean square metric. The method of synthetic iterations can be used to improve the convergence in the uniform metric. A significant increase in the solution accuracy with the use of synthetic iterations allows applying the two-stream approximation for the regular part determination. This approach permits to generalize the proposed method in the case of an arbitrary 3D geometry of the medium.
Shedding New Light on Exploding Stars: Terascale Simulations of Nuetrino-Dreiven Supernovas and Their Nucleosynthesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dennis C. Smolarski, S.J.

Project Abstract This project was a continuation of work begun under a subcontract issued off of TSI-DOE Grant 1528746, awarded to the University of Illinois Urbana-Champaign. Dr. Anthony Mezzacappa is the Principal Investigator on the Illinois award. A separate award was issued to Santa Clara University to continue the collaboration during the time period May 2003 ? 2004. Smolarski continued to work on preconditioner technology and its interface with various iterative methods. He worked primarily with F. Dough Swesty (SUNY-Stony Brook) in continuing software development started in the 2002-03 academic year. Special attention was paid to the development and testingmore » of difference sparse approximate inverse preconditioners and their use in the solution of linear systems arising from radiation transport equations. The target was a high performance platform on which efficient implementation is a critical component of the overall effort. Smolarski also focused on the integration of the adaptive iterative algorithm, Chebycode, developed by Tom Manteuffel and Steve Ashby and adapted by Ryan Szypowski for parallel platforms, into the radiation transport code being developed at SUNY-Stony Brook.« less
Finite Volume Element (FVE) discretization and multilevel solution of the axisymmetric heat equation

NASA Astrophysics Data System (ADS)

Litaker, Eric T.

1994-12-01

The axisymmetric heat equation, resulting from a point-source of heat applied to a metal block, is solved numerically; both iterative and multilevel solutions are computed in order to compare the two processes. The continuum problem is discretized in two stages: finite differences are used to discretize the time derivatives, resulting is a fully implicit backward time-stepping scheme, and the Finite Volume Element (FVE) method is used to discretize the spatial derivatives. The application of the FVE method to a problem in cylindrical coordinates is new, and results in stencils which are analyzed extensively. Several iteration schemes are considered, including both Jacobi and Gauss-Seidel; a thorough analysis of these schemes is done, using both the spectral radii of the iteration matrices and local mode analysis. Using this discretization, a Gauss-Seidel relaxation scheme is used to solve the heat equation iteratively. A multilevel solution process is then constructed, including the development of intergrid transfer and coarse grid operators. Local mode analysis is performed on the components of the amplification matrix, resulting in the two-level convergence factors for various combinations of the operators. A multilevel solution process is implemented by using multigrid V-cycles; the iterative and multilevel results are compared and discussed in detail. The computational savings resulting from the multilevel process are then discussed.
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

PubMed

Hielscher, Andreas H; Bartel, Sebastian

2004-02-01

Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
Dissipation, Voltage Profile and Levy Dragon in a Special Ladder Network

ERIC Educational Resources Information Center

Ucak, C.

2009-01-01

A ladder network constructed by an elementary two-terminal network consisting of a parallel resistor-inductor block in series with a parallel resistor-capacitor block sometimes is said to have a non-dispersive dissipative response. This special ladder network is created iteratively by replacing the elementary two-terminal network in place of the…

Robust iterative method for nonlinear Helmholtz equation

NASA Astrophysics Data System (ADS)

Yuan, Lijun; Lu, Ya Yan

2017-08-01

A new iterative method is developed for solving the two-dimensional nonlinear Helmholtz equation which governs polarized light in media with the optical Kerr nonlinearity. In the strongly nonlinear regime, the nonlinear Helmholtz equation could have multiple solutions related to phenomena such as optical bistability and symmetry breaking. The new method exhibits a much more robust convergence behavior than existing iterative methods, such as frozen-nonlinearity iteration, Newton's method and damped Newton's method, and it can be used to find solutions when good initial guesses are unavailable. Numerical results are presented for the scattering of light by a nonlinear circular cylinder based on the exact nonlocal boundary condition and a pseudospectral method in the polar coordinate system.
Optical CT imaging of solid radiochromic dosimeters in mismatched refractive index solutions using a scanning laser and large area detector

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dekker, Kurtis H., E-mail: kdekker2@uwo.ca

Purpose: The practical use of the PRESAGE® solid plastic dosimeter is limited by the inconvenience of immersing it in high-viscosity oils to achieve refractive index matching for optical computed tomography (CT) scanning. The oils are slow to mix and difficult to clean from surfaces, and the dosimeter rotation can generate dynamic Schlieren inhomogeneity patterns in the reference liquid, limiting the rotational and overall scan speed. Therefore, it would be beneficial if lower-viscosity, water-based solutions with slightly unmatched refractive index could be used instead. The purpose of this work is to demonstrate the feasibility of allowing mismatched conditions when using amore » scanning laser system with a large acceptance angle detector. A fiducial-based ray path measurement technique is combined with an iterative CT reconstruction algorithm to reconstruct images. Methods: A water based surrounding liquid with a low viscosity was selected for imaging PRESAGE® solid dosimeters. Liquid selection was optimized to achieve as high a refractive index as possible while avoiding rotation-induced Schlieren effects. This led to a refractive index mismatch of 6% between liquid and dosimeters. Optical CT scans were performed with a fan-beam scanning-laser optical CT system with a large area detector to capture most of the refracted rays. A fiducial marker placed on the wall of a cylindrical sample occludes a given light ray twice. With knowledge of the rotation angle and the radius of the cylindrical object, the actual internal path of each ray through the dosimeter can be calculated. Scans were performed with 1024 projections of 512 data samples each, and rays were rebinned to form 512 parallel-beam projections. Reconstructions were performed on a 512 × 512 grid using 100 iterations of the SIRT iterative CT algorithm. Proof of concept was demonstrated with a uniformly attenuating solution phantom. PRESAGE® dosimeters (11 cm diameter) were irradiated with Cobalt-60 irradiator to achieve either a uniform dose or a 2-level “step-dose” pattern. Results: With 6% refractive index mismatching, a circular field of view of 85% of the diameter of a cylindrical sample can be reconstructed accurately. Reconstructed images of the test solution phantom were uniform (within 3%) inside this radius. However, the dose responses of the PRESAGE® samples were not spatially uniform, with variations of at least 5% in sensitivity. The variation appears as a “cupping” artifact with less sensitivity in the middle than at the periphery of the PRESAGE® cylinder. Polarization effects were also detected for these samples. Conclusions: The fiducial-based ray path measurement scheme, coupled with an iterative reconstruction algorithm, enabled optical CT scanning of PRESAGE® dosimeters immersed in mismatched refractive index solutions. However, improvements to PRESAGE® dose response uniformity are required.« less
Comparison of a 3-D GPU-Assisted Maxwell Code and Ray Tracing for Reflectometry on ITER

NASA Astrophysics Data System (ADS)

Gady, Sarah; Kubota, Shigeyuki; Johnson, Irena

2015-11-01

Electromagnetic wave propagation and scattering in magnetized plasmas are important diagnostics for high temperature plasmas. 1-D and 2-D full-wave codes are standard tools for measurements of the electron density profile and fluctuations; however, ray tracing results have shown that beam propagation in tokamak plasmas is inherently a 3-D problem. The GPU-Assisted Maxwell Code utilizes the FDTD (Finite-Difference Time-Domain) method for solving the Maxwell equations with the cold plasma approximation in a 3-D geometry. Parallel processing with GPGPU (General-Purpose computing on Graphics Processing Units) is used to accelerate the computation. Previously, we reported on initial comparisons of the code results to 1-D numerical and analytical solutions, where the size of the computational grid was limited by the on-board memory of the GPU. In the current study, this limitation is overcome by using domain decomposition and an additional GPU. As a practical application, this code is used to study the current design of the ITER Low Field Side Reflectometer (LSFR) for the Equatorial Port Plug 11 (EPP11). A detailed examination of Gaussian beam propagation in the ITER edge plasma will be presented, as well as comparisons with ray tracing. This work was made possible by funding from the Department of Energy for the Summer Undergraduate Laboratory Internship (SULI) program. This work is supported by the US DOE Contract No.DE-AC02-09CH11466 and DE-FG02-99-ER54527.
An iterative phase-space explicit discontinuous Galerkin method for stellar radiative transfer in extended atmospheres

NASA Astrophysics Data System (ADS)

de Almeida, Valmor F.

2017-07-01

A phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equation and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.
Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.

PubMed

Palkowski, Marek; Bielecki, Wlodzimierz

2018-01-15

RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.
Myria: Scalable Analytics as a Service

NASA Astrophysics Data System (ADS)

Howe, B.; Halperin, D.; Whitaker, A.

2014-12-01

At the UW eScience Institute, we're working to empower non-experts, especially in the sciences, to write and use data-parallel algorithms. To this end, we are building Myria, a web-based platform for scalable analytics and data-parallel programming. Myria's internal model of computation is the relational algebra extended with iteration, such that every program is inherently data-parallel, just as every query in a database is inherently data-parallel. But unlike databases, iteration is a first class concept, allowing us to express machine learning tasks, graph traversal tasks, and more. Programs can be expressed in a number of languages and can be executed on a number of execution environments, but we emphasize a particular language called MyriaL that supports both imperative and declarative styles and a particular execution engine called MyriaX that uses an in-memory column-oriented representation and asynchronous iteration. We deliver Myria over the web as a service, providing an editor, performance analysis tools, and catalog browsing features in a single environment. We find that this web-based "delivery vector" is critical in reaching non-experts: they are insulated from irrelevant effort technical work associated with installation, configuration, and resource management. The MyriaX backend, one of several execution runtimes we support, is a main-memory, column-oriented, RDBMS-on-the-worker system that supports cyclic data flows as a first-class citizen and has been shown to outperform competitive systems on 100-machine cluster sizes. I will describe the Myria system, give a demo, and present some new results in large-scale oceanographic microbiology.
High-Order Methods for Incompressible Fluid Flow

NASA Astrophysics Data System (ADS)

Deville, M. O.; Fischer, P. F.; Mund, E. H.

2002-08-01

High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.
Solution of Cubic Equations by Iteration Methods on a Pocket Calculator

ERIC Educational Resources Information Center

Bamdad, Farzad

2004-01-01

A method to provide students a vision of how they can write iteration programs on an inexpensive programmable pocket calculator, without requiring a PC or a graphing calculator is developed. Two iteration methods are used, successive-approximations and bisection methods.
WARP3D-Release 10.8: Dynamic Nonlinear Analysis of Solids using a Preconditioned Conjugate Gradient Software Architecture

NASA Technical Reports Server (NTRS)

Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.

1998-01-01

This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
Development of Fast Algorithms Using Recursion, Nesting and Iterations for Computational Electromagnetics

NASA Technical Reports Server (NTRS)

Chew, W. C.; Song, J. M.; Lu, C. C.; Weedon, W. H.

1995-01-01

In the first phase of our work, we have concentrated on laying the foundation to develop fast algorithms, including the use of recursive structure like the recursive aggregate interaction matrix algorithm (RAIMA), the nested equivalence principle algorithm (NEPAL), the ray-propagation fast multipole algorithm (RPFMA), and the multi-level fast multipole algorithm (MLFMA). We have also investigated the use of curvilinear patches to build a basic method of moments code where these acceleration techniques can be used later. In the second phase, which is mainly reported on here, we have concentrated on implementing three-dimensional NEPAL on a massively parallel machine, the Connection Machine CM-5, and have been able to obtain some 3D scattering results. In order to understand the parallelization of codes on the Connection Machine, we have also studied the parallelization of 3D finite-difference time-domain (FDTD) code with PML material absorbing boundary condition (ABC). We found that simple algorithms like the FDTD with material ABC can be parallelized very well allowing us to solve within a minute a problem of over a million nodes. In addition, we have studied the use of the fast multipole method and the ray-propagation fast multipole algorithm to expedite matrix-vector multiplication in a conjugate-gradient solution to integral equations of scattering. We find that these methods are faster than LU decomposition for one incident angle, but are slower than LU decomposition when many incident angles are needed as in the monostatic RCS calculations.
Approximate Solution of Time-Fractional Advection-Dispersion Equation via Fractional Variational Iteration Method

PubMed Central

İbiş, Birol

2014-01-01

This paper aims to obtain the approximate solution of time-fractional advection-dispersion equation (FADE) involving Jumarie's modification of Riemann-Liouville derivative by the fractional variational iteration method (FVIM). FVIM provides an analytical approximate solution in the form of a convergent series. Some examples are given and the results indicate that the FVIM is of high accuracy, more efficient, and more convenient for solving time FADEs. PMID:24578662
The multigrid preconditioned conjugate gradient method

NASA Technical Reports Server (NTRS)

Tatebe, Osamu

1993-01-01

A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
A block iterative finite element algorithm for numerical solution of the steady-state, compressible Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Cooke, C. H.

1976-01-01

An iterative method for numerically solving the time independent Navier-Stokes equations for viscous compressible flows is presented. The method is based upon partial application of the Gauss-Seidel principle in block form to the systems of nonlinear algebraic equations which arise in construction of finite element (Galerkin) models approximating solutions of fluid dynamic problems. The C deg-cubic element on triangles is employed for function approximation. Computational results for a free shear flow at Re = 1,000 indicate significant achievement of economy in iterative convergence rate over finite element and finite difference models which employ the customary time dependent equations and asymptotic time marching procedure to steady solution. Numerical results are in excellent agreement with those obtained for the same test problem employing time marching finite element and finite difference solution techniques.
US NDC Modernization Iteration E2 Prototyping Report: User Interface Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lewis, Jennifer E.; Palmer, Melanie A.; Vickers, James Wallace

2014-12-01

During the second iteration of the US NDC Modernization Elaboration phase (E2), the SNL US NDC Modernization project team completed follow-on Rich Client Platform (RCP) exploratory prototyping related to the User Interface Framework (UIF). The team also developed a survey of browser-based User Interface solutions and completed exploratory prototyping for selected solutions. This report presents the results of the browser-based UI survey, summarizes the E2 browser-based UI and RCP prototyping work, and outlines a path forward for the third iteration of the Elaboration phase (E3).
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
Development of New Methods for the Solution of Nonlinear Differential Equations by the Method of Lie Series and Extension to New Fields.

DTIC Science & Technology

perturbation formulas of Groebner (1960) and Alexseev (1961) for the solution of ordinary differential equations. These formulas are generalized and...iteration methods are given, which include the Methods of Picard, Groebner -Knapp, Poincare, Chen, as special cases. Chapter 3 generalizes an iterated
Convergence characteristics of nonlinear vortex-lattice methods for configuration aerodynamics

NASA Technical Reports Server (NTRS)

Seginer, A.; Rusak, Z.; Wasserstrom, E.

1983-01-01

Nonlinear panel methods have no proof for the existence and uniqueness of their solutions. The convergence characteristics of an iterative, nonlinear vortex-lattice method are, therefore, carefully investigated. The effects of several parameters, including (1) the surface-paneling method, (2) an integration method of the trajectories of the wake vortices, (3) vortex-grid refinement, and (4) the initial conditions for the first iteration on the computed aerodynamic coefficients and on the flow-field details are presented. The convergence of the iterative-solution procedure is usually rapid. The solution converges with grid refinement to a constant value, but the final value is not unique and varies with the wing surface-paneling and wake-discretization methods within some range in the vicinity of the experimental result.
Solution algorithms for nonlinear transient heat conduction analysis employing element-by-element iterative strategies

NASA Technical Reports Server (NTRS)

Winget, J. M.; Hughes, T. J. R.

1985-01-01

The particular problems investigated in the present study arise from nonlinear transient heat conduction. One of two types of nonlinearities considered is related to a material temperature dependence which is frequently needed to accurately model behavior over the range of temperature of engineering interest. The second nonlinearity is introduced by radiation boundary conditions. The finite element equations arising from the solution of nonlinear transient heat conduction problems are formulated. The finite element matrix equations are temporally discretized, and a nonlinear iterative solution algorithm is proposed. Algorithms for solving the linear problem are discussed, taking into account the form of the matrix equations, Gaussian elimination, cost, and iterative techniques. Attention is also given to approximate factorization, implementational aspects, and numerical results.
A Riccati solution for the ideal MHD plasma response with applications to real-time stability control

NASA Astrophysics Data System (ADS)

Glasser, Alexander; Kolemen, Egemen; Glasser, A. H.

2018-03-01

Active feedback control of ideal MHD stability in a tokamak requires rapid plasma stability analysis. Toward this end, we reformulate the δW stability method with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the generic tokamak ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD matrix Riccati differential equation. Since Riccati equations are prevalent in the control theory literature, such a shift in perspective brings to bear a range of numerical methods that are well-suited to the robust, fast solution of control problems. We discuss the usefulness of Riccati techniques in solving the stiff ordinary differential equations often encountered in ideal MHD stability analyses—for example, in tokamak edge and stellarator physics. We demonstrate the applicability of such methods to an existing 2D ideal MHD stability code—DCON [A. H. Glasser, Phys. Plasmas 23, 072505 (2016)]—enabling its parallel operation in near real-time, with wall-clock time ≪1 s . Such speed may help enable active feedback ideal MHD stability control, especially in tokamak plasmas whose ideal MHD equilibria evolve with inductive timescale τ≳ 1s—as in ITER.
Verification of continuum drift kinetic equation solvers in NIMROD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Held, E. D.; Ji, J.-Y.; Kruger, S. E.

Verification of continuum solutions to the electron and ion drift kinetic equations (DKEs) in NIMROD [C. R. Sovinec et al., J. Comp. Phys. 195, 355 (2004)] is demonstrated through comparison with several neoclassical transport codes, most notably NEO [E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 54, 015015 (2012)]. The DKE solutions use NIMROD's spatial representation, 2D finite-elements in the poloidal plane and a 1D Fourier expansion in toroidal angle. For 2D velocity space, a novel 1D expansion in finite elements is applied for the pitch angle dependence and a collocation grid is used for the normalized speedmore » coordinate. The full, linearized Coulomb collision operator is kept and shown to be important for obtaining quantitative results. Bootstrap currents, parallel ion flows, and radial particle and heat fluxes show quantitative agreement between NIMROD and NEO for a variety of tokamak equilibria. In addition, velocity space distribution function contours for ions and electrons show nearly identical detailed structure and agree quantitatively. A Θ-centered, implicit time discretization and a block-preconditioned, iterative linear algebra solver provide efficient electron and ion DKE solutions that ultimately will be used to obtain closures for NIMROD's evolving fluid model.« less

Numerical Computation of Subsonic Conical Diffuser Flows with Nonuniform Turbulent Inlet Conditions

DTIC Science & Technology

1977-09-01

Gauss - Seidel Point Iteration Method . . . . . . . . . . . . . . . 7.0 FACTORS AFFECTING THE RATE OF CONVERGENCE OF THE POINT...can be solved in several ways. For simplicity, a standard Gauss - Seidel iteration method is used to obtain the solution . The method updates the...FACTORS AFFECTING THE RATE OF CONVERGENCE OF THE POINT ITERATION ,ŘETHOD The advantage of using the Gauss - Seidel point iteration method to
Newton's method: A link between continuous and discrete solutions of nonlinear problems

NASA Technical Reports Server (NTRS)

Thurston, G. A.

1980-01-01

Newton's method for nonlinear mechanics problems replaces the governing nonlinear equations by an iterative sequence of linear equations. When the linear equations are linear differential equations, the equations are usually solved by numerical methods. The iterative sequence in Newton's method can exhibit poor convergence properties when the nonlinear problem has multiple solutions for a fixed set of parameters, unless the iterative sequences are aimed at solving for each solution separately. The theory of the linear differential operators is often a better guide for solution strategies in applying Newton's method than the theory of linear algebra associated with the numerical analogs of the differential operators. In fact, the theory for the differential operators can suggest the choice of numerical linear operators. In this paper the method of variation of parameters from the theory of linear ordinary differential equations is examined in detail in the context of Newton's method to demonstrate how it might be used as a guide for numerical solutions.
Fast sweeping method for the factored eikonal equation

NASA Astrophysics Data System (ADS)

Fomel, Sergey; Luo, Songting; Zhao, Hongkai

2009-09-01

We develop a fast sweeping method for the factored eikonal equation. By decomposing the solution of a general eikonal equation as the product of two factors: the first factor is the solution to a simple eikonal equation (such as distance) or a previously computed solution to an approximate eikonal equation. The second factor is a necessary modification/correction. Appropriate discretization and a fast sweeping strategy are designed for the equation of the correction part. The key idea is to enforce the causality of the original eikonal equation during the Gauss-Seidel iterations. Using extensive numerical examples we demonstrate that (1) the convergence behavior of the fast sweeping method for the factored eikonal equation is the same as for the original eikonal equation, i.e., the number of iterations for the Gauss-Seidel iterations is independent of the mesh size, (2) the numerical solution from the factored eikonal equation is more accurate than the numerical solution directly computed from the original eikonal equation, especially for point sources.
Stability of the iterative solutions of integral equations as one phase freezing criterion.

PubMed

Fantoni, R; Pastore, G

2003-10-01

A recently proposed connection between the threshold for the stability of the iterative solution of integral equations for the pair correlation functions of a classical fluid and the structural instability of the corresponding real fluid is carefully analyzed. Direct calculation of the Lyapunov exponent of the standard iterative solution of hypernetted chain and Percus-Yevick integral equations for the one-dimensional (1D) hard rods fluid shows the same behavior observed in 3D systems. Since no phase transition is allowed in such 1D system, our analysis shows that the proposed one phase criterion, at least in this case, fails. We argue that the observed proximity between the numerical and the structural instability in 3D originates from the enhanced structure present in the fluid but, in view of the arbitrary dependence on the iteration scheme, it seems uneasy to relate the numerical stability analysis to a robust one-phase criterion for predicting a thermodynamic phase transition.
Beamforming Based Full-Duplex for Millimeter-Wave Communication

PubMed Central

Liu, Xiao; Xiao, Zhenyu; Bai, Lin; Choi, Jinho; Xia, Pengfei; Xia, Xiang-Gen

2016-01-01

In this paper, we study beamforming based full-duplex (FD) systems in millimeter-wave (mmWave) communications. A joint transmission and reception (Tx/Rx) beamforming problem is formulated to maximize the achievable rate by mitigating self-interference (SI). Since the optimal solution is difficult to find due to the non-convexity of the objective function, suboptimal schemes are proposed in this paper. A low-complexity algorithm, which iteratively maximizes signal power while suppressing SI, is proposed and its convergence is proven. Moreover, two closed-form solutions, which do not require iterations, are also derived under minimum-mean-square-error (MMSE), zero-forcing (ZF), and maximum-ratio transmission (MRT) criteria. Performance evaluations show that the proposed iterative scheme converges fast (within only two iterations on average) and approaches an upper-bound performance, while the two closed-form solutions also achieve appealing performances, although there are noticeable differences from the upper bound depending on channel conditions. Interestingly, these three schemes show different robustness against the geometry of Tx/Rx antenna arrays and channel estimation errors. PMID:27455256
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less
Efficiency trade-offs of steady-state methods using FEM and FDM. [iterative solutions for nonlinear flow equations

NASA Technical Reports Server (NTRS)

Gartling, D. K.; Roache, P. J.

1978-01-01

The efficiency characteristics of finite element and finite difference approximations for the steady-state solution of the Navier-Stokes equations are examined. The finite element method discussed is a standard Galerkin formulation of the incompressible, steady-state Navier-Stokes equations. The finite difference formulation uses simple centered differences that are O(delta x-squared). Operation counts indicate that a rapidly converging Newton-Raphson-Kantorovitch iteration scheme is generally preferable over a Picard method. A split NOS Picard iterative algorithm for the finite difference method was most efficient.
Iterative spectral methods and spectral solutions to compressible flows

NASA Technical Reports Server (NTRS)

Hussaini, M. Y.; Zang, T. A.

1982-01-01

A spectral multigrid scheme is described which can solve pseudospectral discretizations of self-adjoint elliptic problems in O(N log N) operations. An iterative technique for efficiently implementing semi-implicit time-stepping for pseudospectral discretizations of Navier-Stokes equations is discussed. This approach can handle variable coefficient terms in an effective manner. Pseudospectral solutions of compressible flow problems are presented. These include one dimensional problems and two dimensional Euler solutions. Results are given both for shock-capturing approaches and for shock-fitting ones.
Aerodynamic optimization by simultaneously updating flow variables and design parameters with application to advanced propeller designs

NASA Technical Reports Server (NTRS)

Rizk, Magdi H.

1988-01-01

A scheme is developed for solving constrained optimization problems in which the objective function and the constraint function are dependent on the solution of the nonlinear flow equations. The scheme updates the design parameter iterative solutions and the flow variable iterative solutions simultaneously. It is applied to an advanced propeller design problem with the Euler equations used as the flow governing equations. The scheme's accuracy, efficiency and sensitivity to the computational parameters are tested.
Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

NASA Astrophysics Data System (ADS)

Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

2017-04-01

Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40 Hz for a model of 2001 by 600 grid points with 5 m grid spacing and 5000 time steps, in less than 2.5 minutes per source. In practice, using 15 nodes (30 GPUs) to model 101 sources, each iteration took approximately 9 minutes. For reference, the same inversion run with our CPU code uses two hours per iteration. This was done using only a very simple wavefield interpolation technique, saving every second timestep. Using a more sophisticated checkpointing or wavefield reconstruction method would allow us to increase this model size significantly. Our results show that ordinary gaming GPUs are a viable alternative to the expensive professional GPUs often used today, when performing large scale modeling and inversion in geophysics.
An iterative transformation procedure for numerical solution of flutter and similar characteristics-value problems

NASA Technical Reports Server (NTRS)

Gossard, Myron L

1952-01-01

An iterative transformation procedure suggested by H. Wielandt for numerical solution of flutter and similar characteristic-value problems is presented. Application of this procedure to ordinary natural-vibration problems and to flutter problems is shown by numerical examples. Comparisons of computed results with experimental values and with results obtained by other methods of analysis are made.
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
2D Seismic Imaging of Elastic Parameters by Frequency Domain Full Waveform Inversion

NASA Astrophysics Data System (ADS)

Brossier, R.; Virieux, J.; Operto, S.

2008-12-01

Thanks to recent advances in parallel computing, full waveform inversion is today a tractable seismic imaging method to reconstruct physical parameters of the earth interior at different scales ranging from the near- surface to the deep crust. We present a massively parallel 2D frequency-domain full-waveform algorithm for imaging visco-elastic media from multi-component seismic data. The forward problem (i.e. the resolution of the frequency-domain 2D PSV elastodynamics equations) is based on low-order Discontinuous Galerkin (DG) method (P0 and/or P1 interpolations). Thanks to triangular unstructured meshes, the DG method allows accurate modeling of both body waves and surface waves in case of complex topography for a discretization of 10 to 15 cells per shear wavelength. The frequency-domain DG system is solved efficiently for multiple sources with the parallel direct solver MUMPS. The local inversion procedure (i.e. minimization of residuals between observed and computed data) is based on the adjoint-state method which allows to efficiently compute the gradient of the objective function. Applying the inversion hierarchically from the low frequencies to the higher ones defines a multiresolution imaging strategy which helps convergence towards the global minimum. In place of expensive Newton algorithm, the combined use of the diagonal terms of the approximate Hessian matrix and optimization algorithms based on quasi-Newton methods (Conjugate Gradient, LBFGS, ...) allows to improve the convergence of the iterative inversion. The distribution of forward problem solutions over processors driven by a mesh partitioning performed by METIS allows to apply most of the inversion in parallel. We shall present the main features of the parallel modeling/inversion algorithm, assess its scalability and illustrate its performances with realistic synthetic case studies.
Program for the solution of multipoint boundary value problems of quasilinear differential equations

NASA Technical Reports Server (NTRS)

1973-01-01

Linear equations are solved by a method of superposition of solutions of a sequence of initial value problems. For nonlinear equations and/or boundary conditions, the solution is iterative and in each iteration a problem like the linear case is solved. A simple Taylor series expansion is used for the linearization of both nonlinear equations and nonlinear boundary conditions. The perturbation method of solution is used in preference to quasilinearization because of programming ease, and smaller storage requirements; and experiments indicate that the desired convergence properties exist although no proof or convergence is given.
Broad-search algorithms for the spacecraft trajectory design of Callisto-Ganymede-Io triple flyby sequences from 2024 to 2040, Part II: Lambert pathfinding and trajectory solutions

NASA Astrophysics Data System (ADS)

Lynam, Alfred E.

2014-01-01

Triple-satellite-aided capture employs gravity-assist flybys of three of the Galilean moons of Jupiter in order to decrease the amount of ΔV required to capture a spacecraft into Jupiter orbit. Similarly, triple flybys can be used within a Jupiter satellite tour to rapidly modify the orbital parameters of a Jovicentric orbit, or to increase the number of science flybys. In order to provide a nearly comprehensive search of the solution space of Callisto-Ganymede-Io triple flybys from 2024 to 2040, a third-order, Chebyshev's method variant of the p-iteration solution to Lambert's problem is paired with a second-order, Newton-Raphson method, time of flight iteration solution to the V∞-matching problem. The iterative solutions of these problems provide the orbital parameters of the Callisto-Ganymede transfer, the Ganymede flyby, and the Ganymede-Io transfer, but the characteristics of the Callisto and Io flybys are unconstrained, so they are permitted to vary in order to produce an even larger number of trajectory solutions. The vast amount of solution data is searched to find the best triple-satellite-aided capture window between 2024 and 2040.
A free software for pore-scale modelling: solving Stokes equation for velocity fields and permeability values in 3D pore geometries

NASA Astrophysics Data System (ADS)

Gerke, Kirill; Vasilyev, Roman; Khirevich, Siarhei; Karsanina, Marina; Collins, Daniel; Korost, Dmitry; Mallants, Dirk

2015-04-01

In this contribution we introduce a novel free software which solves the Stokes equation to obtain velocity fields for low Reynolds-number flows within externally generated 3D pore geometries. Provided with velocity fields, one can calculate permeability for known pressure gradient boundary conditions via Darcy's equation. Finite-difference schemes of 2nd and 4th order of accuracy are used together with an artificial compressibility method to iteratively converge to a steady-state solution of Stokes' equation. This numerical approach is much faster and less computationally demanding than the majority of open-source or commercial softwares employing other algorithms (finite elements/volumes, lattice Boltzmann, etc.) The software consists of two parts: 1) a pre and post-processing graphical interface, and 2) a solver. The latter is efficiently parallelized to use any number of available cores (the speedup on 16 threads was up to 10-12 depending on hardware). Due to parallelization and memory optimization our software can be used to obtain solutions for 300x300x300 voxels geometries on modern desktop PCs. The software was successfully verified by testing it against lattice Boltzmann simulations and analytical solutions. To illustrate the software's applicability for numerous problems in Earth Sciences, a number of case studies have been developed: 1) identifying the representative elementary volume for permeability determination within a sandstone sample, 2) derivation of permeability/hydraulic conductivity values for rock and soil samples and comparing those with experimentally obtained values, 3) revealing the influence of the amount of fine-textured material such as clay on filtration properties of sandy soil. This work was partially supported by RSF grant 14-17-00658 (pore-scale modelling) and RFBR grants 13-04-00409-a and 13-05-01176-a.
A Kronecker product splitting preconditioner for two-dimensional space-fractional diffusion equations

NASA Astrophysics Data System (ADS)

Chen, Hao; Lv, Wen; Zhang, Tongtong

2018-05-01

We study preconditioned iterative methods for the linear system arising in the numerical discretization of a two-dimensional space-fractional diffusion equation. Our approach is based on a formulation of the discrete problem that is shown to be the sum of two Kronecker products. By making use of an alternating Kronecker product splitting iteration technique we establish a class of fixed-point iteration methods. Theoretical analysis shows that the new method converges to the unique solution of the linear system. Moreover, the optimal choice of the involved iteration parameters and the corresponding asymptotic convergence rate are computed exactly when the eigenvalues of the system matrix are all real. The basic iteration is accelerated by a Krylov subspace method like GMRES. The corresponding preconditioner is in a form of a Kronecker product structure and requires at each iteration the solution of a set of discrete one-dimensional fractional diffusion equations. We use structure preserving approximations to the discrete one-dimensional fractional diffusion operators in the action of the preconditioning matrix. Numerical examples are presented to illustrate the effectiveness of this approach.
Iterative nonlinear joint transform correlation for the detection of objects in cluttered scenes

NASA Astrophysics Data System (ADS)

Haist, Tobias; Tiziani, Hans J.

1999-03-01

An iterative correlation technique with digital image processing in the feedback loop for the detection of small objects in cluttered scenes is proposed. A scanning aperture is combined with the method in order to improve the immunity against noise and clutter. Multiple reference objects or different views of one object are processed in parallel. We demonstrate the method by detecting a noisy and distorted face in a crowd with a nonlinear joint transform correlator.
Global Asymptotic Behavior of Iterative Implicit Schemes

NASA Technical Reports Server (NTRS)

Yee, H. C.; Sweby, P. K.

1994-01-01

The global asymptotic nonlinear behavior of some standard iterative procedures in solving nonlinear systems of algebraic equations arising from four implicit linear multistep methods (LMMs) in discretizing three models of 2 x 2 systems of first-order autonomous nonlinear ordinary differential equations (ODEs) is analyzed using the theory of dynamical systems. The iterative procedures include simple iteration and full and modified Newton iterations. The results are compared with standard Runge-Kutta explicit methods, a noniterative implicit procedure, and the Newton method of solving the steady part of the ODEs. Studies showed that aside from exhibiting spurious asymptotes, all of the four implicit LMMs can change the type and stability of the steady states of the differential equations (DEs). They also exhibit a drastic distortion but less shrinkage of the basin of attraction of the true solution than standard nonLMM explicit methods. The simple iteration procedure exhibits behavior which is similar to standard nonLMM explicit methods except that spurious steady-state numerical solutions cannot occur. The numerical basins of attraction of the noniterative implicit procedure mimic more closely the basins of attraction of the DEs and are more efficient than the three iterative implicit procedures for the four implicit LMMs. Contrary to popular belief, the initial data using the Newton method of solving the steady part of the DEs may not have to be close to the exact steady state for convergence. These results can be used as an explanation for possible causes and cures of slow convergence and nonconvergence of steady-state numerical solutions when using an implicit LMM time-dependent approach in computational fluid dynamics.
Phase Reconstruction from FROG Using Genetic Algorithms[Frequency-Resolved Optical Gating

DOE Office of Scientific and Technical Information (OSTI.GOV)

Omenetto, F.G.; Nicholson, J.W.; Funk, D.J.

1999-04-12

The authors describe a new technique for obtaining the phase and electric field from FROG measurements using genetic algorithms. Frequency-Resolved Optical Gating (FROG) has gained prominence as a technique for characterizing ultrashort pulses. FROG consists of a spectrally resolved autocorrelation of the pulse to be measured. Typically a combination of iterative algorithms is used, applying constraints from experimental data, and alternating between the time and frequency domain, in order to retrieve an optical pulse. The authors have developed a new approach to retrieving the intensity and phase from FROG data using a genetic algorithm (GA). A GA is a generalmore » parallel search technique that operates on a population of potential solutions simultaneously. Operators in a genetic algorithm, such as crossover, selection, and mutation are based on ideas taken from evolution.« less

Information flow in layered networks of non-monotonic units

NASA Astrophysics Data System (ADS)

Schittler Neves, Fabio; Martim Schubert, Benno; Erichsen, Rubem, Jr.

2015-07-01

Layered neural networks are feedforward structures that yield robust parallel and distributed pattern recognition. Even though much attention has been paid to pattern retrieval properties in such systems, many aspects of their dynamics are not yet well characterized or understood. In this work we study, at different temperatures, the memory activity and information flows through layered networks in which the elements are the simplest binary odd non-monotonic function. Our results show that, considering a standard Hebbian learning approach, the network information content has its maximum always at the monotonic limit, even though the maximum memory capacity can be found at non-monotonic values for small enough temperatures. Furthermore, we show that such systems exhibit rich macroscopic dynamics, including not only fixed point solutions of its iterative map, but also cyclic and chaotic attractors that also carry information.
Numerical modeling for the retrofit of the hydraulic cooling subsystems in operating power plant

NASA Astrophysics Data System (ADS)

AlSaqoor, S.; Alahmer, A.; Al Quran, F.; Andruszkiewicz, A.; Kubas, K.; Regucki, P.; Wędrychowicz, W.

2017-08-01

This paper presents the possibility of using the numerical methods to analyze the work of hydraulic systems on the example of a cooling system of a power boiler auxiliary devices. The variety of conditions at which hydraulic system that operated in specific engineering subsystems requires an individualized approach to the model solutions that have been developed for these systems modernizing. A mathematical model of a series-parallel propagation for the cooling water was derived and iterative methods were used to solve the system of nonlinear equations. The results of numerical calculations made it possible to analyze different variants of a modernization of the studied system and to indicate its critical elements. An economic analysis of different options allows an investor to choose an optimal variant of a reconstruction of the installation.
Divide-and-conquer density functional theory on hierarchical real-space grids: Parallel implementation and applications

NASA Astrophysics Data System (ADS)

Shimojo, Fuyuki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2008-02-01

A linear-scaling algorithm based on a divide-and-conquer (DC) scheme has been designed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory (DFT). Electronic wave functions are represented on a real-space grid, which is augmented with a coarse multigrid to accelerate the convergence of iterative solutions and with adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid DC-DFT algorithm on massively parallel computers. The largest benchmark tests include 11.8×106 -atom ( 1.04×1012 electronic degrees of freedom) calculation on 131 072 IBM BlueGene/L processors. The DC-DFT algorithm has well-defined parameters to control the data locality, with which the solutions converge rapidly. Also, the total energy is well conserved during the MD simulation. We perform first-principles MD simulations based on the DC-DFT algorithm, in which large system sizes bring in excellent agreement with x-ray scattering measurements for the pair-distribution function of liquid Rb and allow the description of low-frequency vibrational modes of graphene. The band gap of a CdSe nanorod calculated by the DC-DFT algorithm agrees well with the available conventional DFT results. With the DC-DFT algorithm, the band gap is calculated for larger system sizes until the result reaches the asymptotic value.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
An iterative phase-space explicit discontinuous Galerkin method for stellar radiative transfer in extended atmospheres

DOE Office of Scientific and Technical Information (OSTI.GOV)

de Almeida, Valmor F.

In this work, a phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equationmore » and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.« less
An iterative phase-space explicit discontinuous Galerkin method for stellar radiative transfer in extended atmospheres

DOE PAGES

de Almeida, Valmor F.

2017-04-19

In this work, a phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equationmore » and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.« less
GPU-accelerated regularized iterative reconstruction for few-view cone beam CT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matenine, Dmitri, E-mail: dmitri.matenine.1@ulaval.ca; Goussard, Yves, E-mail: yves.goussard@polymtl.ca; Després, Philippe, E-mail: philippe.despres@phy.ulaval.ca

2015-04-15

Purpose: The present work proposes an iterative reconstruction technique designed for x-ray transmission computed tomography (CT). The main objective is to provide a model-based solution to the cone-beam CT reconstruction problem, yielding accurate low-dose images via few-views acquisitions in clinically acceptable time frames. Methods: The proposed technique combines a modified ordered subsets convex (OSC) algorithm and the total variation minimization (TV) regularization technique and is called OSC-TV. The number of subsets of each OSC iteration follows a reduction pattern in order to ensure the best performance of the regularization method. Considering the high computational cost of the algorithm, it ismore » implemented on a graphics processing unit, using parallelization to accelerate computations. Results: The reconstructions were performed on computer-simulated as well as human pelvic cone-beam CT projection data and image quality was assessed. In terms of convergence and image quality, OSC-TV performs well in reconstruction of low-dose cone-beam CT data obtained via a few-view acquisition protocol. It compares favorably to the few-view TV-regularized projections onto convex sets (POCS-TV) algorithm. It also appears to be a viable alternative to full-dataset filtered backprojection. Execution times are of 1–2 min and are compatible with the typical clinical workflow for nonreal-time applications. Conclusions: Considering the image quality and execution times, this method may be useful for reconstruction of low-dose clinical acquisitions. It may be of particular benefit to patients who undergo multiple acquisitions by reducing the overall imaging radiation dose and associated risks.« less
Application of Four-Point Newton-EGSOR iteration for the numerical solution of 2D Porous Medium Equations

NASA Astrophysics Data System (ADS)

Chew, J. V. L.; Sulaiman, J.

2017-09-01

Partial differential equations that are used in describing the nonlinear heat and mass transfer phenomena are difficult to be solved. For the case where the exact solution is difficult to be obtained, it is necessary to use a numerical procedure such as the finite difference method to solve a particular partial differential equation. In term of numerical procedure, a particular method can be considered as an efficient method if the method can give an approximate solution within the specified error with the least computational complexity. Throughout this paper, the two-dimensional Porous Medium Equation (2D PME) is discretized by using the implicit finite difference scheme to construct the corresponding approximation equation. Then this approximation equation yields a large-sized and sparse nonlinear system. By using the Newton method to linearize the nonlinear system, this paper deals with the application of the Four-Point Newton-EGSOR (4NEGSOR) iterative method for solving the 2D PMEs. In addition to that, the efficiency of the 4NEGSOR iterative method is studied by solving three examples of the problems. Based on the comparative analysis, the Newton-Gauss-Seidel (NGS) and the Newton-SOR (NSOR) iterative methods are also considered. The numerical findings show that the 4NEGSOR method is superior to the NGS and the NSOR methods in terms of the number of iterations to get the converged solutions, the time of computation and the maximum absolute errors produced by the methods.
irGPU.proton.Net: Irregular strong charge interaction networks of protonatable groups in protein molecules--a GPU solver using the fast multipole method and statistical thermodynamics.

PubMed

Kantardjiev, Alexander A

2015-04-05

A cluster of strongly interacting ionization groups in protein molecules with irregular ionization behavior is suggestive for specific structure-function relationship. However, their computational treatment is unconventional (e.g., lack of convergence in naive self-consistent iterative algorithm). The stringent evaluation requires evaluation of Boltzmann averaged statistical mechanics sums and electrostatic energy estimation for each microstate. irGPU: Irregular strong interactions in proteins--a GPU solver is novel solution to a versatile problem in protein biophysics--atypical protonation behavior of coupled groups. The computational severity of the problem is alleviated by parallelization (via GPU kernels) which is applied for the electrostatic interaction evaluation (including explicit electrostatics via the fast multipole method) as well as statistical mechanics sums (partition function) estimation. Special attention is given to the ease of the service and encapsulation of theoretical details without sacrificing rigor of computational procedures. irGPU is not just a solution-in-principle but a promising practical application with potential to entice community into deeper understanding of principles governing biomolecule mechanisms. © 2015 Wiley Periodicals, Inc.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

PubMed

Liang, Faming; Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.
A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

PubMed Central

Kim, Jinsu; Song, Qifan

2016-01-01

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu

2006-04-17

TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
A novel highly parallel algorithm for linearly unmixing hyperspectral images

NASA Astrophysics Data System (ADS)

Guerra, Raúl; López, Sebastián.; Callico, Gustavo M.; López, Jose F.; Sarmiento, Roberto

2014-10-01

Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Steady state numerical solutions for determining the location of MEMS on projectile

NASA Astrophysics Data System (ADS)

Abiprayu, K.; Abdigusna, M. F. F.; Gunawan, P. H.

2018-03-01

This paper is devoted to compare the numerical solutions for the steady and unsteady state heat distribution model on projectile. Here, the best location for installing of the MEMS on the projectile based on the surface temperature is investigated. Numerical iteration methods, Jacobi and Gauss-Seidel have been elaborated to solve the steady state heat distribution model on projectile. The results using Jacobi and Gauss-Seidel are shown identical but the discrepancy iteration cost for each methods is gained. Using Jacobi’s method, the iteration cost is 350 iterations. Meanwhile, using Gauss-Seidel 188 iterations are obtained, faster than the Jacobi’s method. The comparison of the simulation by steady state model and the unsteady state model by a reference is shown satisfying. Moreover, the best candidate for installing MEMS on projectile is observed at pointT(10, 0) which has the lowest temperature for the other points. The temperature using Jacobi and Gauss-Seidel for scenario 1 and 2 atT(10, 0) are 307 and 309 Kelvin respectively.
Iterative image reconstruction for PROPELLER-MRI using the nonuniform fast fourier transform.

PubMed

Tamhane, Ashish A; Anastasio, Mark A; Gui, Minzhi; Arfanakis, Konstantinos

2010-07-01

To investigate an iterative image reconstruction algorithm using the nonuniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) MRI. Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it with that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased signal to noise ratio, reduced artifacts, for similar spatial resolution, compared with gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter, the new reconstruction technique may provide PROPELLER images with improved image quality compared with conventional gridding. (c) 2010 Wiley-Liss, Inc.
Iterative Image Reconstruction for PROPELLER-MRI using the NonUniform Fast Fourier Transform

PubMed Central

Tamhane, Ashish A.; Anastasio, Mark A.; Gui, Minzhi; Arfanakis, Konstantinos

2013-01-01

Purpose To investigate an iterative image reconstruction algorithm using the non-uniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping parallEL Lines with Enhanced Reconstruction) MRI. Materials and Methods Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it to that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. Results It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased SNR, reduced artifacts, for similar spatial resolution, compared to gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. Conclusion An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter the new reconstruction technique may provide PROPELLER images with improved image quality compared to conventional gridding. PMID:20578028
Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

DTIC Science & Technology

2012-10-01

using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS

Real-Time, Multiple, Pan/Tilt/Zoom, Computer Vision Tracking, and 3D Position Estimating System for Unmanned Aerial System Metrology

DTIC Science & Technology

2013-10-18

of the enclosed tasks plus the last parallel task for a total of five parallel tasks for one iteration, i). for j = 1…N for i = 1… 8 Figure...drizzling juices culminating in a state of salivating desire to cut a piece and enjoy. On the other hand, the smell could be that of a pungent, unpleasant
A New Newton-Like Iterative Method for Roots of Analytic Functions

ERIC Educational Resources Information Center

Otolorin, Olayiwola

2005-01-01

A new Newton-like iterative formula for the solution of non-linear equations is proposed. To derive the formula, the convergence criteria of the one-parameter iteration formula, and also the quasilinearization in the derivation of Newton's formula are reviewed. The result is a new formula which eliminates the limitations of other methods. There is…
An iterative method for the Helmholtz equation

NASA Technical Reports Server (NTRS)

Bayliss, A.; Goldstein, C. I.; Turkel, E.

1983-01-01

An iterative algorithm for the solution of the Helmholtz equation is developed. The algorithm is based on a preconditioned conjugate gradient iteration for the normal equations. The preconditioning is based on an SSOR sweep for the discrete Laplacian. Numerical results are presented for a wide variety of problems of physical interest and demonstrate the effectiveness of the algorithm.
Iterative methods for mixed finite element equations

NASA Technical Reports Server (NTRS)

Nakazawa, S.; Nagtegaal, J. C.; Zienkiewicz, O. C.

1985-01-01

Iterative strategies for the solution of indefinite system of equations arising from the mixed finite element method are investigated in this paper with application to linear and nonlinear problems in solid and structural mechanics. The augmented Hu-Washizu form is derived, which is then utilized to construct a family of iterative algorithms using the displacement method as the preconditioner. Two types of iterative algorithms are implemented. Those are: constant metric iterations which does not involve the update of preconditioner; variable metric iterations, in which the inverse of the preconditioning matrix is updated. A series of numerical experiments is conducted to evaluate the numerical performance with application to linear and nonlinear model problems.
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Leveraging Anderson Acceleration for improved convergence of iterative solutions to transport systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willert, Jeffrey; Taitano, William T.; Knoll, Dana

In this note we demonstrate that using Anderson Acceleration (AA) in place of a standard Picard iteration can not only increase the convergence rate but also make the iteration more robust for two transport applications. We also compare the convergence acceleration provided by AA to that provided by moment-based acceleration methods. Additionally, we demonstrate that those two acceleration methods can be used together in a nested fashion. We begin by describing the AA algorithm. At this point, we will describe two application problems, one from neutronics and one from plasma physics, on which we will apply AA. We provide computationalmore » results which highlight the benefits of using AA, namely that we can compute solutions using fewer function evaluations, larger time-steps, and achieve a more robust iteration.« less
Numerical Aspects of Nonhydrostatic Implementations Applied to a Parallel Finite Element Tsunami Model

NASA Astrophysics Data System (ADS)

Fuchs, A.; Androsov, A.; Harig, S.; Hiller, W.; Rakowsky, N.

2012-04-01

Based on the jeopardy of devastating tsunamis and the unpredictability of such events, tsunami modelling as part of warning systems is still a contemporary topic. The tsunami group of Alfred Wegener Institute developed the simulation tool TsunAWI as contribution to the Early Warning System in Indonesia. Although the precomputed scenarios for this purpose qualify for satisfying deliverables, the study of further improvements continues. While TsunAWI is governed by the Shallow Water Equations, an extension of the model is based on a nonhydrostatic approach. At the arrival of a tsunami wave in coastal regions with rough bathymetry, the term containing the nonhydrostatic part of pressure, that is neglected in the original hydrostatic model, gains in importance. In consideration of this term, a better approximation of the wave is expected. Differences of hydrostatic and nonhydrostatic model results are contrasted in the standard benchmark problem of a solitary wave runup on a plane beach. The observation data provided by Titov and Synolakis (1995) serves as reference. The nonhydrostatic approach implies a set of equations that are similar to the Shallow Water Equations, so the variation of the code can be implemented on top. However, this additional routines cause a lot of issues you have to cope with. So far the computations of the model were purely explicit. In the nonhydrostatic version the determination of an additional unknown and the solution of a large sparse system of linear equations is necessary. The latter constitutes the lion's share of computing time and memory requirement. Since the corresponding matrix is only symmetric in structure and not in values, an iterative Krylov Subspace Method is used, in particular the restarted Generalized Minimal Residual Algorithm GMRES(m). With regard to optimization, we present a comparison of several combinations of sequential and parallel preconditioning techniques respective number of iterations and setup/application time. Since the used software package pARMS 3.2, that provides solving and preconditioning techniques, works via MPI parallelism, in an auxiliary branch we adapted TsunAWI and switched from OpenMP to MPI with attached importance to internal partition management.
Nested Krylov methods and preserving the orthogonality

NASA Technical Reports Server (NTRS)

Desturler, Eric; Fokkema, Diederik R.

1993-01-01

Recently the GMRESR inner-outer iteraction scheme for the solution of linear systems of equations was proposed by Van der Vorst and Vuik. Similar methods have been proposed by Axelsson and Vassilevski and Saad (FGMRES). The outer iteration is GCR, which minimizes the residual over a given set of direction vectors. The inner iteration is GMRES, which at each step computes a new direction vector by approximately solving the residual equation. However, the optimality of the approximation over the space of outer search directions is ignored in the inner GMRES iteration. This leads to suboptimal corrections to the solution in the outer iteration, as components of the outer iteration directions may reenter in the inner iteration process. Therefore we propose to preserve the orthogonality relations of GCR in the inner GMRES iteration. This gives optimal corrections; however, it involves working with a singular, non-symmetric operator. We will discuss some important properties, and we will show by experiments that, in terms of matrix vector products, this modification (almost) always leads to better convergence. However, because we do more orthogonalizations, it does not always give an improved performance in CPU-time. Furthermore, we will discuss efficient implementations as well as the truncation possibilities of the outer GCR process. The experimental results indicate that for such methods it is advantageous to preserve the orthogonality in the inner iteration. Of course we can also use iteration schemes other than GMRES as the inner method; methods with short recurrences like GICGSTAB are of interest.
A Build-Up Interior Method for Linear Programming: Affine Scaling Form

DTIC Science & Technology

1990-02-01

initiating a major iteration imply convergence in a finite number of iterations. Each iteration t of the Dikin algorithm starts with an interior dual...this variant with the affine scaling method of Dikin [5] (in dual form). We have also looked into the analogous variant for the related Karmarkar’s...4] G. B. Dantzig, Linear Programming and Extensions (Princeton University Press, Princeton, NJ, 1963). [5] I. I. Dikin , "Iterative solution of
A Two-Dimensional Helmholtz Equation Solution for the Multiple Cavity Scattering Problem

DTIC Science & Technology

2013-02-01

obtained by using the block Gauss – Seidel iterative meth- od. To show the convergence of the iterative method, we define the error between two...models to the general multiple cavity setting. Numerical examples indicate that the convergence of the Gauss – Seidel iterative method depends on the...variational approach. A block Gauss – Seidel iterative method is introduced to solve the cou- pled system of the multiple cavity scattering problem, where
Methodes iteratives paralleles: Applications en neutronique et en mecanique des fluides

NASA Astrophysics Data System (ADS)

Qaddouri, Abdessamad

Dans cette these, le calcul parallele est applique successivement a la neutronique et a la mecanique des fluides. Dans chacune de ces deux applications, des methodes iteratives sont utilisees pour resoudre le systeme d'equations algebriques resultant de la discretisation des equations du probleme physique. Dans le probleme de neutronique, le calcul des matrices des probabilites de collision (PC) ainsi qu'un schema iteratif multigroupe utilisant une methode inverse de puissance sont parallelises. Dans le probleme de mecanique des fluides, un code d'elements finis utilisant un algorithme iteratif du type GMRES preconditionne est parallelise. Cette these est presentee sous forme de six articles suivis d'une conclusion. Les cinq premiers articles traitent des applications en neutronique, articles qui representent l'evolution de notre travail dans ce domaine. Cette evolution passe par un calcul parallele des matrices des PC et un algorithme multigroupe parallele teste sur un probleme unidimensionnel (article 1), puis par deux algorithmes paralleles l'un mutiregion l'autre multigroupe, testes sur des problemes bidimensionnels (articles 2--3). Ces deux premieres etapes sont suivies par l'application de deux techniques d'acceleration, le rebalancement neutronique et la minimisation du residu aux deux algorithmes paralleles (article 4). Finalement, on a mis en oeuvre l'algorithme multigroupe et le calcul parallele des matrices des PC sur un code de production DRAGON ou les tests sont plus realistes et peuvent etre tridimensionnels (article 5). Le sixieme article (article 6), consacre a l'application a la mecanique des fluides, traite la parallelisation d'un code d'elements finis FES ou le partitionneur de graphe METIS et la librairie PSPARSLIB sont utilises.
Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization

DTIC Science & Technology

2014-05-01

exact one is solved later — as- signed as step 5 of Algorithm 2 — because at each iteration , the ADMM updates the variables in the Gauss - Seidel ...Jacobi ADMM (see Algo- rithm 5 below). Unlike the Gauss - Seidel ADMM, the Jacobi ADMM updates all the 70 blocks in parallel at every iteration : xk+1i...that extending ADMM straightforwardly from the classic Gauss - Seidel setting to the Jacobi setting, from two blocks to multiple blocks, will preserve
The generalised Sylvester matrix equations over the generalised bisymmetric and skew-symmetric matrices

NASA Astrophysics Data System (ADS)

Dehghan, Mehdi; Hajarian, Masoud

2012-08-01

A matrix P is called a symmetric orthogonal if P = P T = P -1. A matrix X is said to be a generalised bisymmetric with respect to P if X = X T = PXP. It is obvious that any symmetric matrix is also a generalised bisymmetric matrix with respect to I (identity matrix). By extending the idea of the Jacobi and the Gauss-Seidel iterations, this article proposes two new iterative methods, respectively, for computing the generalised bisymmetric (containing symmetric solution as a special case) and skew-symmetric solutions of the generalised Sylvester matrix equation ? (including Sylvester and Lyapunov matrix equations as special cases) which is encountered in many systems and control applications. When the generalised Sylvester matrix equation has a unique generalised bisymmetric (skew-symmetric) solution, the first (second) iterative method converges to the generalised bisymmetric (skew-symmetric) solution of this matrix equation for any initial generalised bisymmetric (skew-symmetric) matrix. Finally, some numerical results are given to illustrate the effect of the theoretical results.
A new approach for solving the three-dimensional steady Euler equations. I - General theory

NASA Technical Reports Server (NTRS)

Chang, S.-C.; Adamczyk, J. J.

1986-01-01

The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
A new approach for solving the three-dimensional steady Euler equations. I - General theory

NASA Astrophysics Data System (ADS)

Chang, S.-C.; Adamczyk, J. J.

1986-08-01

The present iterative procedure combines the Clebsch potentials and the Munk-Prim (1947) substitution principle with an extension of a semidirect Cauchy-Riemann solver to three dimensions, in order to solve steady, inviscid three-dimensional rotational flow problems in either subsonic or incompressible flow regimes. This solution procedure can be used, upon discretization, to obtain inviscid subsonic flow solutions in a 180-deg turning channel. In addition to accurately predicting the behavior of weak secondary flows, the algorithm can generate solutions for strong secondary flows and will yield acceptable flow solutions after only 10-20 outer loop iterations.
Image reconstruction

NASA Astrophysics Data System (ADS)

Vasilenko, Georgii Ivanovich; Taratorin, Aleksandr Markovich

Linear, nonlinear, and iterative image-reconstruction (IR) algorithms are reviewed. Theoretical results are presented concerning controllable linear filters, the solution of ill-posed functional minimization problems, and the regularization of iterative IR algorithms. Attention is also given to the problem of superresolution and analytical spectrum continuation, the solution of the phase problem, and the reconstruction of images distorted by turbulence. IR in optical and optical-digital systems is discussed with emphasis on holographic techniques.
Integrated modelling of steady-state scenarios and heating and current drive mixes for ITER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murakami, Masanori; Park, Jin Myung; Giruzzi, G.

2011-01-01

Recent progress on ITER steady-state (SS) scenario modelling by the ITPA-IOS group is reviewed. Code-to-code benchmarks as the IOS group's common activities for the two SS scenarios (weak shear scenario and internal transport barrier scenario) are discussed in terms of transport, kinetic profiles, and heating and current drive (CD) sources using various transport codes. Weak magnetic shear scenarios integrate the plasma core and edge by combining a theory-based transport model (GLF23) with scaled experimental boundary profiles. The edge profiles (at normalized radius rho = 0.8-1.0) are adopted from an edge-localized mode-averaged analysis of a DIII-D ITER demonstration discharge. A fullymore » noninductive SS scenario is achieved with fusion gain Q = 4.3, noninductive fraction f(NI) = 100%, bootstrap current fraction f(BS) = 63% and normalized beta beta(N) = 2.7 at plasma current I(p) = 8MA and toroidal field B(T) = 5.3 T using ITER day-1 heating and CD capability. Substantial uncertainties come from outside the radius of setting the boundary conditions (rho = 0.8). The present simulation assumed that beta(N)(rho) at the top of the pedestal (rho = 0.91) is about 25% above the peeling-ballooning threshold. ITER will have a challenge to achieve the boundary, considering different operating conditions (T(e)/T(i) approximate to 1 and density peaking). Overall, the experimentally scaled edge is an optimistic side of the prediction. A number of SS scenarios with different heating and CD mixes in a wide range of conditions were explored by exploiting the weak-shear steady-state solution procedure with the GLF23 transport model and the scaled experimental edge. The results are also presented in the operation space for DT neutron power versus stationary burn pulse duration with assumed poloidal flux availability at the beginning of stationary burn, indicating that the long pulse operation goal (3000s) at I(p) = 9 MA is possible. Source calculations in these simulations have been revised for electron cyclotron current drive including parallel momentum conservation effects and for neutral beam current drive with finite orbit and magnetic pitch effects.« less
Scalable splitting algorithms for big-data interferometric imaging in the SKA era

NASA Astrophysics Data System (ADS)

Onose, Alexandru; Carrillo, Rafael E.; Repetti, Audrey; McEwen, Jason D.; Thiran, Jean-Philippe; Pesquet, Jean-Christophe; Wiaux, Yves

2016-11-01

In the context of next-generation radio telescopes, like the Square Kilometre Array (SKA), the efficient processing of large-scale data sets is extremely important. Convex optimization tasks under the compressive sensing framework have recently emerged and provide both enhanced image reconstruction quality and scalability to increasingly larger data sets. We focus herein mainly on scalability and propose two new convex optimization algorithmic structures able to solve the convex optimization tasks arising in radio-interferometric imaging. They rely on proximal splitting and forward-backward iterations and can be seen, by analogy, with the CLEAN major-minor cycle, as running sophisticated CLEAN-like iterations in parallel in multiple data, prior, and image spaces. Both methods support any convex regularization function, in particular, the well-studied ℓ1 priors promoting image sparsity in an adequate domain. Tailored for big-data, they employ parallel and distributed computations to achieve scalability, in terms of memory and computational requirements. One of them also exploits randomization, over data blocks at each iteration, offering further flexibility. We present simulation results showing the feasibility of the proposed methods as well as their advantages compared to state-of-the-art algorithmic solvers. Our MATLAB code is available online on GitHub.
Performance evaluation of canny edge detection on a tiled multicore architecture

NASA Astrophysics Data System (ADS)

Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

2011-01-01

In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh

We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to performmore » $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $$1.6 \\times 10^{-2}$$ seconds on an Intel Xeon processor.« less

Parallel computation safety analysis irradiation targets fission product molybdenum in neutronic aspect using the successive over-relaxation algorithm

NASA Astrophysics Data System (ADS)

Susmikanti, Mike; Dewayatna, Winter; Sulistyo, Yos

2014-09-01

One of the research activities in support of commercial radioisotope production program is a safety research on target FPM (Fission Product Molybdenum) irradiation. FPM targets form a tube made of stainless steel which contains nuclear-grade high-enrichment uranium. The FPM irradiation tube is intended to obtain fission products. Fission materials such as Mo99 used widely the form of kits in the medical world. The neutronics problem is solved using first-order perturbation theory derived from the diffusion equation for four groups. In contrast, Mo isotopes have longer half-lives, about 3 days (66 hours), so the delivery of radioisotopes to consumer centers and storage is possible though still limited. The production of this isotope potentially gives significant economic value. The criticality and flux in multigroup diffusion model was calculated for various irradiation positions and uranium contents. This model involves complex computation, with large and sparse matrix system. Several parallel algorithms have been developed for the sparse and large matrix solution. In this paper, a successive over-relaxation (SOR) algorithm was implemented for the calculation of reactivity coefficients which can be done in parallel. Previous works performed reactivity calculations serially with Gauss-Seidel iteratives. The parallel method can be used to solve multigroup diffusion equation system and calculate the criticality and reactivity coefficients. In this research a computer code was developed to exploit parallel processing to perform reactivity calculations which were to be used in safety analysis. The parallel processing in the multicore computer system allows the calculation to be performed more quickly. This code was applied for the safety limits calculation of irradiated FPM targets containing highly enriched uranium. The results of calculations neutron show that for uranium contents of 1.7676 g and 6.1866 g (× 106 cm-1) in a tube, their delta reactivities are the still within safety limits; however, for 7.9542 g and 8.838 g (× 106 cm-1) the limits were exceeded.
Upwind relaxation methods for the Navier-Stokes equations using inner iterations

NASA Technical Reports Server (NTRS)

Taylor, Arthur C., III; Ng, Wing-Fai; Walters, Robert W.

1992-01-01

A subsonic and a supersonic problem are respectively treated by an upwind line-relaxation algorithm for the Navier-Stokes equations using inner iterations to accelerate steady-state solution convergence and thereby minimize CPU time. While the ability of the inner iterative procedure to mimic the quadratic convergence of the direct solver method is attested to in both test problems, some of the nonquadratic inner iterative results are noted to have been more efficient than the quadratic. In the more successful, supersonic test case, inner iteration required only about 65 percent of the line-relaxation method-entailed CPU time.
Achieving algorithmic resilience for temporal integration through spectral deferred corrections

DOE PAGES

Grout, Ray; Kolla, Hemanth; Minion, Michael; ...

2017-05-08

Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering frommore » soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. Here, we demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.« less
Achieving algorithmic resilience for temporal integration through spectral deferred corrections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grout, Ray; Kolla, Hemanth; Minion, Michael

2017-05-08

Spectral deferred corrections (SDC) is an iterative approach for constructing higher- order accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited tomore » recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual on the first correction iteration and changes slowly between successive iterations. We demonstrate the effectiveness of this strategy for both canonical test problems and a comprehen- sive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.« less
Achieving algorithmic resilience for temporal integration through spectral deferred corrections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grout, Ray; Kolla, Hemanth; Minion, Michael

2017-05-08

Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering frommore » soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. We demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.« less
Direct determination of one-dimensional interphase structures using normalized crystal truncation rod analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kawaguchi, Tomoya; Liu, Yihua; Reiter, Anthony

Here, a one-dimensional non-iterative direct method was employed for normalized crystal truncation rod analysis. The non-iterative approach, utilizing the Kramers–Kronig relation, avoids the ambiguities due to an improper initial model or incomplete convergence in the conventional iterative methods. The validity and limitations of the present method are demonstrated through both numerical simulations and experiments with Pt(111) in a 0.1 M CsF aqueous solution. The present method is compared with conventional iterative phase-retrieval methods.
Direct determination of one-dimensional interphase structures using normalized crystal truncation rod analysis

DOE PAGES

Kawaguchi, Tomoya; Liu, Yihua; Reiter, Anthony; ...

2018-04-20

Here, a one-dimensional non-iterative direct method was employed for normalized crystal truncation rod analysis. The non-iterative approach, utilizing the Kramers–Kronig relation, avoids the ambiguities due to an improper initial model or incomplete convergence in the conventional iterative methods. The validity and limitations of the present method are demonstrated through both numerical simulations and experiments with Pt(111) in a 0.1 M CsF aqueous solution. The present method is compared with conventional iterative phase-retrieval methods.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
A Hardware-Accelerated Quantum Monte Carlo framework (HAQMC) for N-body systems

NASA Astrophysics Data System (ADS)

Gothandaraman, Akila; Peterson, Gregory D.; Warren, G. Lee; Hinde, Robert J.; Harrison, Robert J.

2009-12-01

Interest in the study of structural and energetic properties of highly quantum clusters, such as inert gas clusters has motivated the development of a hardware-accelerated framework for Quantum Monte Carlo simulations. In the Quantum Monte Carlo method, the properties of a system of atoms, such as the ground-state energies, are averaged over a number of iterations. Our framework is aimed at accelerating the computations in each iteration of the QMC application by offloading the calculation of properties, namely energy and trial wave function, onto reconfigurable hardware. This gives a user the capability to run simulations for a large number of iterations, thereby reducing the statistical uncertainty in the properties, and for larger clusters. This framework is designed to run on the Cray XD1 high performance reconfigurable computing platform, which exploits the coarse-grained parallelism of the processor along with the fine-grained parallelism of the reconfigurable computing devices available in the form of field-programmable gate arrays. In this paper, we illustrate the functioning of the framework, which can be used to calculate the energies for a model cluster of helium atoms. In addition, we present the capabilities of the framework that allow the user to vary the chemical identities of the simulated atoms. Program summaryProgram title: Hardware Accelerated Quantum Monte Carlo (HAQMC) Catalogue identifier: AEEP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 691 537 No. of bytes in distributed program, including test data, etc.: 5 031 226 Distribution format: tar.gz Programming language: C/C++ for the QMC application, VHDL and Xilinx 8.1 ISE/EDK tools for FPGA design and development Computer: Cray XD1 consisting of a dual-core, dualprocessor AMD Opteron 2.2 GHz with a Xilinx Virtex-4 (V4LX160) or Xilinx Virtex-II Pro (XC2VP50) FPGA per node. We use the compute node with the Xilinx Virtex-4 FPGA Operating system: Red Hat Enterprise Linux OS Has the code been vectorised or parallelized?: Yes Classification: 6.1 Nature of problem: Quantum Monte Carlo is a practical method to solve the Schrödinger equation for large many-body systems and obtain the ground-state properties of such systems. This method involves the sampling of a number of configurations of atoms and averaging the properties of the configurations over a number of iterations. We are interested in applying the QMC method to obtain the energy and other properties of highly quantum clusters, such as inert gas clusters. Solution method: The proposed framework provides a combined hardware-software approach, in which the QMC simulation is performed on the host processor, with the computationally intensive functions such as energy and trial wave function computations mapped onto the field-programmable gate array (FPGA) logic device attached as a co-processor to the host processor. We perform the QMC simulation for a number of iterations as in the case of our original software QMC approach, to reduce the statistical uncertainty of the results. However, our proposed HAQMC framework accelerates each iteration of the simulation, by significantly reducing the time taken to calculate the ground-state properties of the configurations of atoms, thereby accelerating the overall QMC simulation. We provide a generic interpolation framework that can be extended to study a variety of pure and doped atomic clusters, irrespective of the chemical identities of the atoms. For the FPGA implementation of the properties, we use a two-region approach for accurately computing the properties over the entire domain, employ deep pipelines and fixed-point for all our calculations guaranteeing the accuracy required for our simulation.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Exact exchange potential evaluated from occupied Kohn-Sham and Hartree-Fock solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cinal, M.; Holas, A.

2011-06-15

The reported algorithm determines the exact exchange potential v{sub x} in an iterative way using energy shifts (ESs) and orbital shifts (OSs) obtained with finite-difference formulas from the solutions (occupied orbitals and their energies) of the Hartree-Fock-like equation and the Kohn-Sham-like equation, the former used for the initial approximation to v{sub x} and the latter for increments of ES and OS due to subsequent changes of v{sub x}. Thus, the need for solution of the differential equations for OSs, used by Kuemmel and Perdew [Phys. Rev. Lett. 90, 043004 (2003)], is bypassed. The iterated exchange potential, expressed in terms ofmore » ESs and OSs, is improved by modifying ESs at odd iteration steps and OSs at even steps. The modification formulas are related to the optimized-effective-potential equation (satisfied at convergence) written as the condition of vanishing density shift (DS). They are obtained, respectively, by enforcing its satisfaction through corrections to approximate OSs and by determining the optimal ESs that minimize the DS norm. The proposed method, successfully tested for several closed-(sub)shell atoms, from Be to Kr, within the density functional theory exchange-only approximation, proves highly efficient. The calculations using the pseudospectral method for representing orbitals give iterative sequences of approximate exchange potentials (starting with the Krieger-Li-Iafrate approximation) that rapidly approach the exact v{sub x} so that, for Ne, Ar, and Zn, the corresponding DS norm becomes less than 10{sup -6} after 13, 13, and 9 iteration steps for a given electron density. In self-consistent density calculations, orbital energies of 10{sup -4} hartree accuracy are obtained for these atoms after, respectively, 9, 12, and 12 density iteration steps, each involving just two steps of v{sub x} iteration, while the accuracy limit of 10{sup -6} to 10{sup -7} hartree is reached after 20 density iterations.« less
Exact exchange potential evaluated from occupied Kohn-Sham and Hartree-Fock solutions

NASA Astrophysics Data System (ADS)

Cinal, M.; Holas, A.

2011-06-01

The reported algorithm determines the exact exchange potential vx in an iterative way using energy shifts (ESs) and orbital shifts (OSs) obtained with finite-difference formulas from the solutions (occupied orbitals and their energies) of the Hartree-Fock-like equation and the Kohn-Sham-like equation, the former used for the initial approximation to vx and the latter for increments of ES and OS due to subsequent changes of vx. Thus, the need for solution of the differential equations for OSs, used by Kümmel and Perdew [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.90.043004 90, 043004 (2003)], is bypassed. The iterated exchange potential, expressed in terms of ESs and OSs, is improved by modifying ESs at odd iteration steps and OSs at even steps. The modification formulas are related to the optimized-effective-potential equation (satisfied at convergence) written as the condition of vanishing density shift (DS). They are obtained, respectively, by enforcing its satisfaction through corrections to approximate OSs and by determining the optimal ESs that minimize the DS norm. The proposed method, successfully tested for several closed-(sub)shell atoms, from Be to Kr, within the density functional theory exchange-only approximation, proves highly efficient. The calculations using the pseudospectral method for representing orbitals give iterative sequences of approximate exchange potentials (starting with the Krieger-Li-Iafrate approximation) that rapidly approach the exact vx so that, for Ne, Ar, and Zn, the corresponding DS norm becomes less than 10-6 after 13, 13, and 9 iteration steps for a given electron density. In self-consistent density calculations, orbital energies of 10-4 hartree accuracy are obtained for these atoms after, respectively, 9, 12, and 12 density iteration steps, each involving just two steps of vx iteration, while the accuracy limit of 10-6 to 10-7 hartree is reached after 20 density iterations.
PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

PubMed

Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

2018-02-08

The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
Assessing the Structure of the Ways of Coping Questionnaire in Fibromyalgia Patients Using Common Factor Analytic Approaches.

PubMed

Van Liew, Charles; Santoro, Maya S; Edwards, Larissa; Kang, Jeremy; Cronan, Terry A

2016-01-01

The Ways of Coping Questionnaire (WCQ) is a widely used measure of coping processes. Despite its use in a variety of populations, there has been concern about the stability and structure of the WCQ across different populations. This study examines the factor structure of the WCQ in a large sample of individuals diagnosed with fibromyalgia. The participants were 501 adults (478 women) who were part of a larger intervention study. Participants completed the WCQ at their 6-month assessment. Foundational factoring approaches were performed on the data (i.e., maximum likelihood factoring [MLF], iterative principal factoring [IPF], principal axis factoring (PAF), and principal components factoring [PCF]) with oblique oblimin rotation. Various criteria were evaluated to determine the number of factors to be extracted, including Kaiser's rule, Scree plot visual analysis, 5 and 10% unique variance explained, 70 and 80% communal variance explained, and Horn's parallel analysis (PA). It was concluded that the 4-factor PAF solution was the preferable solution, based on PA extraction and the fact that this solution minimizes nonvocality and multivocality. The present study highlights the need for more research focused on defining the limits of the WCQ and the degree to which population-specific and context-specific subscale adjustments are needed.
Assessing the Structure of the Ways of Coping Questionnaire in Fibromyalgia Patients Using Common Factor Analytic Approaches

PubMed Central

Edwards, Larissa; Kang, Jeremy

2016-01-01

The Ways of Coping Questionnaire (WCQ) is a widely used measure of coping processes. Despite its use in a variety of populations, there has been concern about the stability and structure of the WCQ across different populations. This study examines the factor structure of the WCQ in a large sample of individuals diagnosed with fibromyalgia. The participants were 501 adults (478 women) who were part of a larger intervention study. Participants completed the WCQ at their 6-month assessment. Foundational factoring approaches were performed on the data (i.e., maximum likelihood factoring [MLF], iterative principal factoring [IPF], principal axis factoring (PAF), and principal components factoring [PCF]) with oblique oblimin rotation. Various criteria were evaluated to determine the number of factors to be extracted, including Kaiser's rule, Scree plot visual analysis, 5 and 10% unique variance explained, 70 and 80% communal variance explained, and Horn's parallel analysis (PA). It was concluded that the 4-factor PAF solution was the preferable solution, based on PA extraction and the fact that this solution minimizes nonvocality and multivocality. The present study highlights the need for more research focused on defining the limits of the WCQ and the degree to which population-specific and context-specific subscale adjustments are needed. PMID:28070160
A superlinear interior points algorithm for engineering design optimization

NASA Technical Reports Server (NTRS)

Herskovits, J.; Asquier, J.

1990-01-01

We present a quasi-Newton interior points algorithm for nonlinear constrained optimization. It is based on a general approach consisting of the iterative solution in the primal and dual spaces of the equalities in Karush-Kuhn-Tucker optimality conditions. This is done in such a way to have primal and dual feasibility at each iteration, which ensures satisfaction of those optimality conditions at the limit points. This approach is very strong and efficient, since at each iteration it only requires the solution of two linear systems with the same matrix, instead of quadratic programming subproblems. It is also particularly appropriate for engineering design optimization inasmuch at each iteration a feasible design is obtained. The present algorithm uses a quasi-Newton approximation of the second derivative of the Lagrangian function in order to have superlinear asymptotic convergence. We discuss theoretical aspects of the algorithm and its computer implementation.
Multivariable frequency domain identification via 2-norm minimization

NASA Technical Reports Server (NTRS)

Bayard, David S.

1992-01-01

The author develops a computational approach to multivariable frequency domain identification, based on 2-norm minimization. In particular, a Gauss-Newton (GN) iteration is developed to minimize the 2-norm of the error between frequency domain data and a matrix fraction transfer function estimate. To improve the global performance of the optimization algorithm, the GN iteration is initialized using the solution to a particular sequentially reweighted least squares problem, denoted as the SK iteration. The least squares problems which arise from both the SK and GN iterations are shown to involve sparse matrices with identical block structure. A sparse matrix QR factorization method is developed to exploit the special block structure, and to efficiently compute the least squares solution. A numerical example involving the identification of a multiple-input multiple-output (MIMO) plant having 286 unknown parameters is given to illustrate the effectiveness of the algorithm.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2016-01-01

Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frinks, Neal T.

2016-01-01

Several improvements to the mixed-elementUSM3Ddiscretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Policy Iteration for $H_\\infty $ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming.

PubMed

Zhu, Yuanheng; Zhao, Dongbin; Yang, Xiong; Zhang, Qichao

2018-02-01

Sum of squares (SOS) polynomials have provided a computationally tractable way to deal with inequality constraints appearing in many control problems. It can also act as an approximator in the framework of adaptive dynamic programming. In this paper, an approximate solution to the optimal control of polynomial nonlinear systems is proposed. Under a given attenuation coefficient, the Hamilton-Jacobi-Isaacs equation is relaxed to an optimization problem with a set of inequalities. After applying the policy iteration technique and constraining inequalities to SOS, the optimization problem is divided into a sequence of feasible semidefinite programming problems. With the converged solution, the attenuation coefficient is further minimized to a lower value. After iterations, approximate solutions to the smallest -gain and the associated optimal controller are obtained. Four examples are employed to verify the effectiveness of the proposed algorithm.

The solution of radiative transfer problems in molecular bands without the LTE assumption by accelerated lambda iteration methods

NASA Technical Reports Server (NTRS)

Kutepov, A. A.; Kunze, D.; Hummer, D. G.; Rybicki, G. B.

1991-01-01

An iterative method based on the use of approximate transfer operators, which was designed initially to solve multilevel NLTE line formation problems in stellar atmospheres, is adapted and applied to the solution of the NLTE molecular band radiative transfer in planetary atmospheres. The matrices to be constructed and inverted are much smaller than those used in the traditional Curtis matrix technique, which makes possible the treatment of more realistic problems using relatively small computers. This technique converges much more rapidly than straightforward iteration between the transfer equation and the equations of statistical equilibrium. A test application of this new technique to the solution of NLTE radiative transfer problems for optically thick and thin bands (the 4.3 micron CO2 band in the Venusian atmosphere and the 4.7 and 2.3 micron CO bands in the earth's atmosphere) is described.
Discrete Self-Similarity in Interfacial Hydrodynamics and the Formation of Iterated Structures.

PubMed

Dallaston, Michael C; Fontelos, Marco A; Tseluiko, Dmitri; Kalliadasis, Serafim

2018-01-19

The formation of iterated structures, such as satellite and subsatellite drops, filaments, and bubbles, is a common feature in interfacial hydrodynamics. Here we undertake a computational and theoretical study of their origin in the case of thin films of viscous fluids that are destabilized by long-range molecular or other forces. We demonstrate that iterated structures appear as a consequence of discrete self-similarity, where certain patterns repeat themselves, subject to rescaling, periodically in a logarithmic time scale. The result is an infinite sequence of ridges and filaments with similarity properties. The character of these discretely self-similar solutions as the result of a Hopf bifurcation from ordinarily self-similar solutions is also described.
Further investigation on "A multiplicative regularization for force reconstruction"

NASA Astrophysics Data System (ADS)

Aucejo, M.; De Smet, O.

2018-05-01

We have recently proposed a multiplicative regularization to reconstruct mechanical forces acting on a structure from vibration measurements. This method does not require any selection procedure for choosing the regularization parameter, since the amount of regularization is automatically adjusted throughout an iterative resolution process. The proposed iterative algorithm has been developed with performance and efficiency in mind, but it is actually a simplified version of a full iterative procedure not described in the original paper. The present paper aims at introducing the full resolution algorithm and comparing it with its simplified version in terms of computational efficiency and solution accuracy. In particular, it is shown that both algorithms lead to very similar identified solutions.
Layout optimization with algebraic multigrid methods

NASA Technical Reports Server (NTRS)

Regler, Hans; Ruede, Ulrich

1993-01-01

Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
3D CAFE modeling of grain structures: application to primary dendritic and secondary eutectic solidification

NASA Astrophysics Data System (ADS)

Carozzani, T.; Digonnet, H.; Gandin, Ch-A.

2012-01-01

A three-dimensional model is presented for the prediction of grain structures formed in casting. It is based on direct tracking of grain boundaries using a cellular automaton (CA) method. The model is fully coupled with a solution of the heat flow computed with a finite element (FE) method. Several unique capabilities are implemented including (i) the possibility to track the development of several types of grain structures, e.g. dendritic and eutectic grains, (ii) a coupling scheme that permits iterations between the FE method and the CA method, and (iii) tabulated enthalpy curves for the solid and liquid phases that offer the possibility to work with multicomponent alloys. The present CAFE model is also fully parallelized and runs on a cluster of computers. Demonstration is provided by direct comparison between simulated and recorded cooling curves for a directionally solidified aluminum-7 wt% silicon alloy.
Sierra/Solid Mechanics 4.48 User's Guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merewether, Mark Thomas; Crane, Nathan K; de Frias, Gabriel Jose

Sierra/SolidMechanics (Sierra/SM) is a Lagrangian, three-dimensional code for finite element analysis of solids and structures. It provides capabilities for explicit dynamic, implicit quasistatic and dynamic analyses. The explicit dynamics capabilities allow for the efficient and robust solution of models with extensive contact subjected to large, suddenly applied loads. For implicit problems, Sierra/SM uses a multi-level iterative solver, which enables it to effectively solve problems with large deformations, nonlinear material behavior, and contact. Sierra/SM has a versatile library of continuum and structural elements, and a large library of material models. The code is written for parallel computing environments enabling scalable solutionsmore » of extremely large problems for both implicit and explicit analyses. It is built on the SIERRA Framework, which facilitates coupling with other SIERRA mechanics codes. This document describes the functionality and input syntax for Sierra/SM.« less
Toroidal Ampere-Faraday Equations Solved Consistently with the CQL3D Fokker-Planck Time-Evolution

NASA Astrophysics Data System (ADS)

Harvey, R. W.; Petrov, Yu. V.

2013-10-01

A self-consistent, time-dependent toroidal electric field calculation is a key feature of a complete 3D Fokker-Planck kinetic distribution radial transport code for f(v,theta,rho,t). In the present CQL3D finite-difference model, the electric field E(rho,t) is either prescribed, or iteratively adjusted to obtain prescribed toroidal or parallel currents. We discuss first results of an implementation of the Ampere-Faraday equation for the self-consistent toroidal electric field, as applied to the runaway electron production in tokamaks due to rapid reduction of the plasma temperature as occurs in a plasma disruption. Our previous results assuming a constant current density (Lenz' Law) model showed that prompt ``hot-tail runaways'' dominated ``knock-on'' and Dreicer ``drizzle'' runaways; we will examine modifications due to the more complete Ampere-Faraday solution. Work supported by US DOE under DE-FG02-ER54744.
Accelerating NLTE radiative transfer by means of the Forth-and-Back Implicit Lambda Iteration: A two-level atom line formation in 2D Cartesian coordinates

NASA Astrophysics Data System (ADS)

Milić, Ivan; Atanacković, Olga

2014-10-01

State-of-the-art methods in multidimensional NLTE radiative transfer are based on the use of local approximate lambda operator within either Jacobi or Gauss-Seidel iterative schemes. Here we propose another approach to the solution of 2D NLTE RT problems, Forth-and-Back Implicit Lambda Iteration (FBILI), developed earlier for 1D geometry. In order to present the method and examine its convergence properties we use the well-known instance of the two-level atom line formation with complete frequency redistribution. In the formal solution of the RT equation we employ short characteristics with two-point algorithm. Using an implicit representation of the source function in the computation of the specific intensities, we compute and store the coefficients of the linear relations J=a+bS between the mean intensity J and the corresponding source function S. The use of iteration factors in the ‘local’ coefficients of these implicit relations in two ‘inward’ sweeps of 2D grid, along with the update of the source function in other two ‘outward’ sweeps leads to four times faster solution than the Jacobi’s one. Moreover, the update made in all four consecutive sweeps of the grid leads to an acceleration by a factor of 6-7 compared to the Jacobi iterative scheme.
One-Dimensional Fokker-Planck Equation with Quadratically Nonlinear Quasilocal Drift

NASA Astrophysics Data System (ADS)

Shapovalov, A. V.

2018-04-01

The Fokker-Planck equation in one-dimensional spacetime with quadratically nonlinear nonlocal drift in the quasilocal approximation is reduced with the help of scaling of the coordinates and time to a partial differential equation with a third derivative in the spatial variable. Determining equations for the symmetries of the reduced equation are derived and the Lie symmetries are found. A group invariant solution having the form of a traveling wave is found. Within the framework of Adomian's iterative method, the first iterations of an approximate solution of the Cauchy problem are obtained. Two illustrative examples of exact solutions are found.
Chemical Transport in a Fissured Rock: Verification of a Numerical Model

NASA Astrophysics Data System (ADS)

Rasmuson, A.; Narasimhan, T. N.; Neretnieks, I.

1982-10-01

Numerical models for simulating chemical transport in fissured rocks constitute powerful tools for evaluating the acceptability of geological nuclear waste repositories. Due to the very long-term, high toxicity of some nuclear waste products, the models are required to predict, in certain cases, the spatial and temporal distribution of chemical concentration less than 0.001% of the concentration released from the repository. Whether numerical models can provide such accuracies is a major question addressed in the present work. To this end we have verified a numerical model, TRUMP, which solves the advective diffusion equation in general three dimensions, with or without decay and source terms. The method is based on an integrated finite difference approach. The model was verified against known analytic solution of the one-dimensional advection-diffusion problem, as well as the problem of advection-diffusion in a system of parallel fractures separated by spherical particles. The studies show that as long as the magnitude of advectance is equal to or less than that of conductance for the closed surface bounding any volume element in the region (that is, numerical Peclet number <2), the numerical method can indeed match the analytic solution within errors of ±10-3% or less. The realistic input parameters used in the sample calculations suggest that such a range of Peclet numbers is indeed likely to characterize deep groundwater systems in granitic and ancient argillaceous systems. Thus TRUMP in its present form does provide a viable tool for use in nuclear waste evaluation studies. A sensitivity analysis based on the analytic solution suggests that the errors in prediction introduced due to uncertainties in input parameters are likely to be larger than the computational inaccuracies introduced by the numerical model. Currently, a disadvantage in the TRUMP model is that the iterative method of solving the set of simultaneous equations is rather slow when time constants vary widely over the flow region. Although the iterative solution may be very desirable for large three-dimensional problems in order to minimize computer storage, it seems desirable to use a direct solver technique in conjunction with the mixed explicit-implicit approach whenever possible. Work in this direction is in progress.
Acoustic and elastic multiple scattering and radiation from cylindrical structures

NASA Astrophysics Data System (ADS)

Amirkulova, Feruza Abdukadirovna

Multiple scattering (MS) and radiation of waves by a system of scatterers is of great theoretical and practical importance and is required in a wide variety of physical contexts such as the implementation of "invisibility" cloaks, the effective parameter characterization, and the fabrication of dynamically tunable structures, etc. The dissertation develops fast, rapidly convergent iterative techniques to expedite the solution of MS problems. The formulation of MS problems reduces to a system of linear algebraic equations using Graf's theorem and separation of variables. The iterative techniques are developed using Neumann expansion and Block Toeplitz structure of the linear system; they are very general, and suitable for parallel computations and a large number of MS problems, i.e. acoustic, elastic, electromagnetic, etc., and used for the first time to solve MS problems. The theory is implemented in Matlab and FORTRAN, and the theoretical predictions are compared to computations obtained by COMSOL. To formulate the MS problem, the transition matrix is obtained by analyzing an acoustic and an elastic single scattering of incident waves by elastic isotropic and anisotropic solids. The mathematical model of wave scattering from multilayered cylindrical and spherical structures is developed by means of an exact solution of dynamic 3D elasticity theory. The recursive impedance matrix algorithm is derived for radially heterogeneous anisotropic solids. An explicit method for finding the impedance in piecewise uniform, transverse-isotropic material is proposed; the solution is compared to elasticity theory solutions involving Buchwald potentials. Furthermore, active exterior cloaking devices are modeled for acoustic and elastic media using multipole sources. A cloaking device can render an object invisible to some incident waves as seen by some external observer. The active cloak is generated by a discrete set of multipole sources that destructively interfere with an incident wave to produce zero total field over a finite spatial region. The approach precisely determines the necessary source amplitudes and enables a cloaked region to be determined using Graf's theorem. To apply the approach, the infinite series of multipole expansions are truncated, and the accuracy of cloaking is studied by modifying the truncation parameter.
Fast iterative censoring CFAR algorithm for ship detection from SAR images

NASA Astrophysics Data System (ADS)

Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng

2017-11-01

Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Validity and reliability of an in-training evaluation report to measure the CanMEDS roles in emergency medicine residents.

PubMed

Kassam, Aliya; Donnon, Tyrone; Rigby, Ian

2014-03-01

There is a question of whether a single assessment tool can assess the key competencies of residents as mandated by the Royal College of Physicians and Surgeons of Canada CanMEDS roles framework. The objective of the present study was to investigate the reliability and validity of an emergency medicine (EM) in-training evaluation report (ITER). ITER data from 2009 to 2011 were combined for residents across the 5 years of the EM residency training program. An exploratory factor analysis with varimax rotation was used to explore the construct validity of the ITER. A total of 172 ITERs were completed on residents across their first to fifth year of training. A combined, 24-item ITER yielded a five-factor solution measuring the CanMEDs role Medical Expert/Scholar, Communicator/Collaborator, Professional, Health Advocate and Manager subscales. The factor solution accounted for 79% of the variance, and reliability coefficients (Cronbach alpha) ranged from α = 0.90 to 0.95 for each subscale and α = 0.97 overall. The combined, 24-item ITER used to assess residents' competencies in the EM residency program showed strong reliability and evidence of construct validity for assessment of the CanMEDS roles. Further research is needed to develop and test ITER items that will differentiate each CanMEDS role exclusively.
An Iterated Global Mascon Solution with Focus on Land Ice Mass Evolution

NASA Technical Reports Server (NTRS)

Luthcke, S. B.; Sabaka, T.; Rowlands, D. D.; Lemoine, F. G.; Loomis, B. D.; Boy, J. P.

2012-01-01

Land ice mass evolution is determined from a new GRACE global mascon solution. The solution is estimated directly from the reduction of the inter-satellite K-band range rate observations taking into account the full noise covariance, and formally iterating the solution. The new solution increases signal recovery while reducing the GRACE KBRR observation residuals. The mascons are estimated with 10-day and 1-arc-degree equal area sampling, applying anisotropic constraints for enhanced temporal and spatial resolution of the recovered land ice signal. The details of the solution are presented including error and resolution analysis. An Ensemble Empirical Mode Decomposition (EEMD) adaptive filter is applied to the mascon solution time series to compute timing of balance seasons and annual mass balances. The details and causes of the spatial and temporal variability of the land ice regions studied are discussed.
Nonlinear Analysis of Cavitating Propellers in Nonuniform Flow

DTIC Science & Technology

1992-10-16

Helmholtz more than a century ago [4]. The method was later extended to treat curved bodies at zero cavitation number by Levi - Civita [4]. The theory was...122, 1895. [63] M.P. Tulin. Steady two -dimensional cavity flows about slender bodies . Technical Report 834, DTMB, May 1953. [64] M.P. Tulin...iterative solution for two -dimensional flows is remarkably fast and that the accuracy of the first iteration solution is sufficient for a wide range of
The application of MINIQUASI to thermal program boundary and initial value problems

NASA Technical Reports Server (NTRS)

1974-01-01

The feasibility of applying the solution techniques of Miniquasi to the set of equations which govern a thermoregulatory model is investigated. For solving nonlinear equations and/or boundary conditions, a Taylor Series expansion is required for linearization of both equations and boundary conditions. The solutions are iterative and in each iteration, a problem like the linear case is solved. It is shown that Miniquasi cannot be applied to the thermoregulatory model as originally planned.
Singular boundary method for global gravity field modelling

NASA Astrophysics Data System (ADS)

Cunderlik, Robert

2014-05-01

The singular boundary method (SBM) and method of fundamental solutions (MFS) are meshless boundary collocation techniques that use the fundamental solution of a governing partial differential equation (e.g. the Laplace equation) as their basis functions. They have been developed to avoid singular numerical integration as well as mesh generation in the traditional boundary element method (BEM). SBM have been proposed to overcome a main drawback of MFS - its controversial fictitious boundary outside the domain. The key idea of SBM is to introduce a concept of the origin intensity factors that isolate singularities of the fundamental solution and its derivatives using some appropriate regularization techniques. Consequently, the source points can be placed directly on the real boundary and coincide with the collocation nodes. In this study we deal with SBM applied for high-resolution global gravity field modelling. The first numerical experiment presents a numerical solution to the fixed gravimetric boundary value problem. The achieved results are compared with the numerical solutions obtained by MFS or the direct BEM indicating efficiency of all methods. In the second numerical experiments, SBM is used to derive the geopotential and its first derivatives from the Tzz components of the gravity disturbing tensor observed by the GOCE satellite mission. A determination of the origin intensity factors allows to evaluate the disturbing potential and gravity disturbances directly on the Earth's surface where the source points are located. To achieve high-resolution numerical solutions, the large-scale parallel computations are performed on the cluster with 1TB of the distributed memory and an iterative elimination of far zones' contributions is applied.
Albany/FELIX: A parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis

DOE PAGES

Tezaur, I. K.; Perego, M.; Salinger, A. G.; ...

2015-04-27

This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less
Air Force Maui Optical and Supercomputing Site (AMOS) Application Briefs 2004

DTIC Science & Technology

2004-01-01

Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection...17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 60 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT...SOR methods, the work in parallel matrix iterative methods is very relevant to the parallelization of a CA. The Moore neighborhood used in this work

Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
A numerical scheme to solve unstable boundary value problems

NASA Technical Reports Server (NTRS)

Kalnay-Rivas, E.

1977-01-01

The considered scheme makes it possible to determine an unstable steady state solution in cases in which, because of lack of symmetry, such a solution cannot be obtained analytically, and other time integration or relaxation schemes, because of instability, fail to converge. The iterative solution of a single complex equation is discussed and a nonlinear system of equations is considered. Described applications of the scheme are related to a steady state solution with shear instability, an unstable nonlinear Ekman boundary layer, and the steady state solution of a baroclinic atmosphere with asymmetric forcing. The scheme makes use of forward and backward time integrations of the original spatial differential operators and of an approximation of the adjoint operators. Only two computations of the time derivative per iteration are required.
How to Compute Labile Metal-Ligand Equilibria

ERIC Educational Resources Information Center

de Levie, Robert

2007-01-01

The different methods used for computing labile metal-ligand complexes, which are suitable for an iterative computer solution, are illustrated. The ligand function has allowed students to relegate otherwise tedious iterations to a computer, while retaining complete control over what is calculated.
A new solution procedure for a nonlinear infinite beam equation of motion

NASA Astrophysics Data System (ADS)

Jang, T. S.

2016-10-01

Our goal of this paper is of a purely theoretical question, however which would be fundamental in computational partial differential equations: Can a linear solution-structure for the equation of motion for an infinite nonlinear beam be directly manipulated for constructing its nonlinear solution? Here, the equation of motion is modeled as mathematically a fourth-order nonlinear partial differential equation. To answer the question, a pseudo-parameter is firstly introduced to modify the equation of motion. And then, an integral formalism for the modified equation is found here, being taken as a linear solution-structure. It enables us to formulate a nonlinear integral equation of second kind, equivalent to the original equation of motion. The fixed point approach, applied to the integral equation, results in proposing a new iterative solution procedure for constructing the nonlinear solution of the original beam equation of motion, which consists luckily of just the simple regular numerical integration for its iterative process; i.e., it appears to be fairly simple as well as straightforward to apply. A mathematical analysis is carried out on both natures of convergence and uniqueness of the iterative procedure by proving a contractive character of a nonlinear operator. It follows conclusively,therefore, that it would be one of the useful nonlinear strategies for integrating the equation of motion for a nonlinear infinite beam, whereby the preceding question may be answered. In addition, it may be worth noticing that the pseudo-parameter introduced here has double roles; firstly, it connects the original beam equation of motion with the integral equation, second, it is related with the convergence of the iterative method proposed here.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

NASA Astrophysics Data System (ADS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
Block Gauss elimination followed by a classical iterative method for the solution of linear systems

NASA Astrophysics Data System (ADS)

Alanelli, Maria; Hadjidimos, Apostolos

2004-02-01

In the last two decades many papers have appeared in which the application of an iterative method for the solution of a linear system is preceded by a step of the Gauss elimination process in the hope that this will increase the rates of convergence of the iterative method. This combination of methods has been proven successful especially when the matrix A of the system is an M-matrix. The purpose of this paper is to extend the idea of one to more Gauss elimination steps, consider other classes of matrices A, e.g., p-cyclic consistently ordered, and generalize and improve the asymptotic convergence rates of some of the methods known so far.
Multidirectional hybrid algorithm for the split common fixed point problem and application to the split common null point problem.

PubMed

Li, Xia; Guo, Meifang; Su, Yongfu

2016-01-01

In this article, a new multidirectional monotone hybrid iteration algorithm for finding a solution to the split common fixed point problem is presented for two countable families of quasi-nonexpansive mappings in Banach spaces. Strong convergence theorems are proved. The application of the result is to consider the split common null point problem of maximal monotone operators in Banach spaces. Strong convergence theorems for finding a solution of the split common null point problem are derived. This iteration algorithm can accelerate the convergence speed of iterative sequence. The results of this paper improve and extend the recent results of Takahashi and Yao (Fixed Point Theory Appl 2015:87, 2015) and many others .
FPGA implementation of low complexity LDPC iterative decoder

NASA Astrophysics Data System (ADS)

Verma, Shivani; Sharma, Sanjay

2016-07-01

Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.
Drifts, currents, and power scrape-off width in SOLPS-ITER modeling of DIII-D

DOE PAGES

Meier, E. T.; Goldston, R. J.; Kaveeva, E. G.; ...

2016-12-27

The effects of drifts and associated flows and currents on the width of the parallel heat flux channel (λ q) in the tokamak scrape-off layer (SOL) are analyzed using the SOLPS-ITER 2D fluid transport code. Motivation is supplied by Goldston’s heuristic drift (HD) model for λ q, which yields the same approximately inverse poloidal magnetic field dependence seen in multi-machine regression. The analysis, focusing on a DIII-D H-mode discharge, reveals HD-like features, including comparable density and temperature fall-off lengths in the SOL, and up-down ion pressure asymmetry that allows net cross-separatrix ion magnetic drift flux to exceed net anomalous ionmore » flux. In experimentally relevant high-recycling cases, scans of both toroidal and poloidal magnetic field (B tor and B pol) are conducted, showing minimal λ q dependence on either component of the field. Insensitivity to B tor is expected, and suggests that SOLPS-ITER is effectively capturing some aspects of HD physics. Absence of λ q dependence on B pol, however, is inconsistent with both the HD model and experimental results. As a result, the inconsistency is attributed to strong variation in the parallel Mach number, which violates one of the premises of the HD model.« less
Detection of mouse liver cancer via a parallel iterative shrinkage method in hybrid optical/microcomputed tomography imaging

NASA Astrophysics Data System (ADS)

Wu, Ping; Liu, Kai; Zhang, Qian; Xue, Zhenwen; Li, Yongbao; Ning, Nannan; Yang, Xin; Li, Xingde; Tian, Jie

2012-12-01

Liver cancer is one of the most common malignant tumors worldwide. In order to enable the noninvasive detection of small liver tumors in mice, we present a parallel iterative shrinkage (PIS) algorithm for dual-modality tomography. It takes advantage of microcomputed tomography and multiview bioluminescence imaging, providing anatomical structure and bioluminescence intensity information to reconstruct the size and location of tumors. By incorporating prior knowledge of signal sparsity, we associate some mathematical strategies including specific smooth convex approximation, an iterative shrinkage operator, and affine subspace with the PIS method, which guarantees the accuracy, efficiency, and reliability for three-dimensional reconstruction. Then an in vivo experiment on the bead-implanted mouse has been performed to validate the feasibility of this method. The findings indicate that a tiny lesion less than 3 mm in diameter can be localized with a position bias no more than 1 mm the computational efficiency is one to three orders of magnitude faster than the existing algorithms; this approach is robust to the different regularization parameters and the lp norms. Finally, we have applied this algorithm to another in vivo experiment on an HCCLM3 orthotopic xenograft mouse model, which suggests the PIS method holds the promise for practical applications of whole-body cancer detection.
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Performance Models for the Spike Banded Linear System Solver

DOE PAGES

Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...

2011-01-01

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
An Evaluation of an Algorithm for Linear Inequalities and Its Applications

NASA Technical Reports Server (NTRS)

Jurgensen, J.

1973-01-01

An algorithm is presented for obtaining a solution alpha to a set of inequalities (A alpha) 0 where A is an N x m-matrix and alpha is an m-vector. If the set of inequalities is consistant, then the algorithm is guaranteed to arrive at a solution in a finite number of steps. Also, if in the iteration, a negative vector is obtained, then the initial set of inequalities is inconsistant, and the iteration is terminated.
The optimal modified variational iteration method for the Lane-Emden equations with Neumann and Robin boundary conditions

NASA Astrophysics Data System (ADS)

Singh, Randhir; Das, Nilima; Kumar, Jitendra

2017-06-01

An effective analytical technique is proposed for the solution of the Lane-Emden equations. The proposed technique is based on the variational iteration method (VIM) and the convergence control parameter h . In order to avoid solving a sequence of nonlinear algebraic or complicated integrals for the derivation of unknown constant, the boundary conditions are used before designing the recursive scheme for solution. The series solutions are found which converges rapidly to the exact solution. Convergence analysis and error bounds are discussed. Accuracy, applicability of the method is examined by solving three singular problems: i) nonlinear Poisson-Boltzmann equation, ii) distribution of heat sources in the human head, iii) second-kind Lane-Emden equation.
On the self-similar solution to the Euler equations for an incompressible fluid in three dimensions

NASA Astrophysics Data System (ADS)

Pomeau, Yves

2018-03-01

The equations for a self-similar solution to an inviscid incompressible fluid are mapped into an integral equation that hopefully can be solved by iteration. It is argued that the exponents of the similarity are ruled by Kelvin's theorem of conservation of circulation. The end result is an iteration with a nonlinear term entering a kernel given by a 3D integral for a swirling flow, likely within reach of present-day computational power. Because of the slow decay of the similarity solution at large distances, its kinetic energy diverges, and some mathematical results excluding non-trivial solutions of the Euler equations in the self-similar case do not apply. xml:lang="fr"
A multi-level solution algorithm for steady-state Markov chains

NASA Technical Reports Server (NTRS)

Horton, Graham; Leutenegger, Scott T.

1993-01-01

A new iterative algorithm, the multi-level algorithm, for the numerical solution of steady state Markov chains is presented. The method utilizes a set of recursively coarsened representations of the original system to achieve accelerated convergence. It is motivated by multigrid methods, which are widely used for fast solution of partial differential equations. Initial results of numerical experiments are reported, showing significant reductions in computation time, often an order of magnitude or more, relative to the Gauss-Seidel and optimal SOR algorithms for a variety of test problems. The multi-level method is compared and contrasted with the iterative aggregation-disaggregation algorithm of Takahashi.
Producing Satisfactory Solutions to Scheduling Problems: An Iterative Constraint Relaxation Approach

NASA Technical Reports Server (NTRS)

Chien, S.; Gratch, J.

1994-01-01

One drawback to using constraint-propagation in planning and scheduling systems is that when a problem has an unsatisfiable set of constraints such algorithms typically only show that no solution exists. While, technically correct, in practical situations, it is desirable in these cases to produce a satisficing solution that satisfies the most important constraints (typically defined in terms of maximizing a utility function). This paper describes an iterative constraint relaxation approach in which the scheduler uses heuristics to progressively relax problem constraints until the problem becomes satisfiable. We present empirical results of applying these techniques to the problem of scheduling spacecraft communications for JPL/NASA antenna resources.
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Iterative dataset optimization in automated planning: Implementation for breast and rectal cancer radiotherapy.

PubMed

Fan, Jiawei; Wang, Jiazhou; Zhang, Zhen; Hu, Weigang

2017-06-01

To develop a new automated treatment planning solution for breast and rectal cancer radiotherapy. The automated treatment planning solution developed in this study includes selection of the iterative optimized training dataset, dose volume histogram (DVH) prediction for the organs at risk (OARs), and automatic generation of clinically acceptable treatment plans. The iterative optimized training dataset is selected by an iterative optimization from 40 treatment plans for left-breast and rectal cancer patients who received radiation therapy. A two-dimensional kernel density estimation algorithm (noted as two parameters KDE) which incorporated two predictive features was implemented to produce the predicted DVHs. Finally, 10 additional new left-breast treatment plans are re-planned using the Pinnacle 3 Auto-Planning (AP) module (version 9.10, Philips Medical Systems) with the objective functions derived from the predicted DVH curves. Automatically generated re-optimized treatment plans are compared with the original manually optimized plans. By combining the iterative optimized training dataset methodology and two parameters KDE prediction algorithm, our proposed automated planning strategy improves the accuracy of the DVH prediction. The automatically generated treatment plans using the dose derived from the predicted DVHs can achieve better dose sparing for some OARs without compromising other metrics of plan quality. The proposed new automated treatment planning solution can be used to efficiently evaluate and improve the quality and consistency of the treatment plans for intensity-modulated breast and rectal cancer radiation therapy. © 2017 American Association of Physicists in Medicine.

SISYPHUS: A high performance seismic inversion factory

NASA Astrophysics Data System (ADS)

Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas

2016-04-01

In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
On-line range images registration with GPGPU

NASA Astrophysics Data System (ADS)

Będkowski, J.; Naruniec, J.

2013-03-01

This paper concerns implementation of algorithms in the two important aspects of modern 3D data processing: data registration and segmentation. Solution proposed for the first topic is based on the 3D space decomposition, while the latter on image processing and local neighbourhood search. Data processing is implemented by using NVIDIA compute unified device architecture (NIVIDIA CUDA) parallel computation. The result of the segmentation is a coloured map where different colours correspond to different objects, such as walls, floor and stairs. The research is related to the problem of collecting 3D data with a RGB-D camera mounted on a rotated head, to be used in mobile robot applications. Performance of the data registration algorithm is aimed for on-line processing. The iterative closest point (ICP) approach is chosen as a registration method. Computations are based on the parallel fast nearest neighbour search. This procedure decomposes 3D space into cubic buckets and, therefore, the time of the matching is deterministic. First technique of the data segmentation uses accele-rometers integrated with a RGB-D sensor to obtain rotation compensation and image processing method for defining pre-requisites of the known categories. The second technique uses the adapted nearest neighbour search procedure for obtaining normal vectors for each range point.
Chebyshev polynomial filtered subspace iteration in the discontinuous Galerkin method for large-scale electronic structure calculations

DOE PAGES

Banerjee, Amartya S.; Lin, Lin; Hu, Wei; ...

2016-10-21

The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) canmore » be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALB functions requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale twodimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. In conclusion, employing 55 296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8586 atoms is 90 s, and the time for a graphene sheet containing 11 520 atoms is 75 s.« less
Task 7: ADPAC User's Manual

NASA Technical Reports Server (NTRS)

Hall, E. J.; Topp, D. A.; Delaney, R. A.

1996-01-01

The overall objective of this study was to develop a 3-D numerical analysis for compressor casing treatment flowfields. The current version of the computer code resulting from this study is referred to as ADPAC (Advanced Ducted Propfan Analysis Codes-Version 7). This report is intended to serve as a computer program user's manual for the ADPAC code developed under Tasks 6 and 7 of the NASA Contract. The ADPAC program is based on a flexible multiple- block grid discretization scheme permitting coupled 2-D/3-D mesh block solutions with application to a wide variety of geometries. Aerodynamic calculations are based on a four-stage Runge-Kutta time-marching finite volume solution technique with added numerical dissipation. Steady flow predictions are accelerated by a multigrid procedure. An iterative implicit algorithm is available for rapid time-dependent flow calculations, and an advanced two equation turbulence model is incorporated to predict complex turbulent flows. The consolidated code generated during this study is capable of executing in either a serial or parallel computing mode from a single source code. Numerous examples are given in the form of test cases to demonstrate the utility of this approach for predicting the aerodynamics of modem turbomachinery configurations.
Sub-problem Optimization With Regression and Neural Network Approximators

NASA Technical Reports Server (NTRS)

Guptill, James D.; Hopkins, Dale A.; Patnaik, Surya N.

2003-01-01

Design optimization of large systems can be attempted through a sub-problem strategy. In this strategy, the original problem is divided into a number of smaller problems that are clustered together to obtain a sequence of sub-problems. Solution to the large problem is attempted iteratively through repeated solutions to the modest sub-problems. This strategy is applicable to structures and to multidisciplinary systems. For structures, clustering the substructures generates the sequence of sub-problems. For a multidisciplinary system, individual disciplines, accounting for coupling, can be considered as sub-problems. A sub-problem, if required, can be further broken down to accommodate sub-disciplines. The sub-problem strategy is being implemented into the NASA design optimization test bed, referred to as "CometBoards." Neural network and regression approximators are employed for reanalysis and sensitivity analysis calculations at the sub-problem level. The strategy has been implemented in sequential as well as parallel computational environments. This strategy, which attempts to alleviate algorithmic and reanalysis deficiencies, has the potential to become a powerful design tool. However, several issues have to be addressed before its full potential can be harnessed. This paper illustrates the strategy and addresses some issues.
Research in Computational Aeroscience Applications Implemented on Advanced Parallel Computing Systems

NASA Technical Reports Server (NTRS)

Wigton, Larry

1996-01-01

Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data

NASA Astrophysics Data System (ADS)

Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.

2015-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
DENSITY-DEPENDENT FLOW IN ONE-DIMENSIONAL VARIABLY-SATURATED MEDIA

EPA Science Inventory

A one-dimensional finite element is developed to simulate density-dependent flow of saltwater in variably saturated media. The flow and solute equations were solved in a coupled mode (iterative), in a partially coupled mode (non-iterative), and in a completely decoupled mode. P...
The MHOST finite element program: 3-D inelastic analysis methods for hot section components. Volume 1: Theoretical manual

NASA Technical Reports Server (NTRS)

Nakazawa, Shohei

1991-01-01

Formulations and algorithms implemented in the MHOST finite element program are discussed. The code uses a novel concept of the mixed iterative solution technique for the efficient 3-D computations of turbine engine hot section components. The general framework of variational formulation and solution algorithms are discussed which were derived from the mixed three field Hu-Washizu principle. This formulation enables the use of nodal interpolation for coordinates, displacements, strains, and stresses. Algorithmic description of the mixed iterative method includes variations for the quasi static, transient dynamic and buckling analyses. The global-local analysis procedure referred to as the subelement refinement is developed in the framework of the mixed iterative solution, of which the detail is presented. The numerically integrated isoparametric elements implemented in the framework is discussed. Methods to filter certain parts of strain and project the element discontinuous quantities to the nodes are developed for a family of linear elements. Integration algorithms are described for linear and nonlinear equations included in MHOST program.
An efficient algorithm using matrix methods to solve wind tunnel force-balance equations

NASA Technical Reports Server (NTRS)

Smith, D. L.

1972-01-01

An iterative procedure applying matrix methods to accomplish an efficient algorithm for automatic computer reduction of wind-tunnel force-balance data has been developed. Balance equations are expressed in a matrix form that is convenient for storing balance sensitivities and interaction coefficient values for online or offline batch data reduction. The convergence of the iterative values to a unique solution of this system of equations is investigated, and it is shown that for balances which satisfy the criteria discussed, this type of solution does occur. Methods for making sensitivity adjustments and initial load effect considerations in wind-tunnel applications are also discussed, and the logic for determining the convergence accuracy limits for the iterative solution is given. This more efficient data reduction program is compared with the technique presently in use at the NASA Langley Research Center, and computational times on the order of one-third or less are demonstrated by use of this new program.
Agile parallel bioinformatics workflow management using Pwrake.

PubMed

Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

2011-09-08

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake

PubMed Central

2011-01-01

Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
Periodic Pulay method for robust and efficient convergence acceleration of self-consistent field iterations

DOE PAGES

Banerjee, Amartya S.; Suryanarayana, Phanish; Pask, John E.

2016-01-21

Pulay's Direct Inversion in the Iterative Subspace (DIIS) method is one of the most widely used mixing schemes for accelerating the self-consistent solution of electronic structure problems. In this work, we propose a simple generalization of DIIS in which Pulay extrapolation is performed at periodic intervals rather than on every self-consistent field iteration, and linear mixing is performed on all other iterations. Lastly, we demonstrate through numerical tests on a wide variety of materials systems in the framework of density functional theory that the proposed generalization of Pulay's method significantly improves its robustness and efficiency.
Asymptotic (h tending to infinity) absolute stability for BDFs applied to stiff differential equations. [Backward Differentiation Formulas

NASA Technical Reports Server (NTRS)

Krogh, F. T.; Stewart, K.

1984-01-01

Methods based on backward differentiation formulas (BDFs) for solving stiff differential equations require iterating to approximate the solution of the corrector equation on each step. One hope for reducing the cost of this is to make do with iteration matrices that are known to have errors and to do no more iterations than are necessary to maintain the stability of the method. This paper, following work by Klopfenstein, examines the effect of errors in the iteration matrix on the stability of the method. Application of the results to an algorithm is discussed briefly.
The modification of hybrid method of ant colony optimization, particle swarm optimization and 3-OPT algorithm in traveling salesman problem

NASA Astrophysics Data System (ADS)

Hertono, G. F.; Ubadah; Handari, B. D.

2018-03-01

The traveling salesman problem (TSP) is a famous problem in finding the shortest tour to visit every vertex exactly once, except the first vertex, given a set of vertices. This paper discusses three modification methods to solve TSP by combining Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO) and 3-Opt Algorithm. The ACO is used to find the solution of TSP, in which the PSO is implemented to find the best value of parameters α and β that are used in ACO.In order to reduce the total of tour length from the feasible solution obtained by ACO, then the 3-Opt will be used. In the first modification, the 3-Opt is used to reduce the total tour length from the feasible solutions obtained at each iteration, meanwhile, as the second modification, 3-Opt is used to reduce the total tour length from the entire solution obtained at every iteration. In the third modification, 3-Opt is used to reduce the total tour length from different solutions obtained at each iteration. Results are tested using 6 benchmark problems taken from TSPLIB by calculating the relative error to the best known solution as well as the running time. Among those modifications, only the second and third modification give satisfactory results except the second one needs more execution time compare to the third modifications.
Multilevel acceleration of scattering-source iterations with application to electron transport

DOE PAGES

Drumm, Clif; Fan, Wesley

2017-08-18

Acceleration/preconditioning strategies available in the SCEPTRE radiation transport code are described. A flexible transport synthetic acceleration (TSA) algorithm that uses a low-order discrete-ordinates (S N) or spherical-harmonics (P N) solve to accelerate convergence of a high-order S N source-iteration (SI) solve is described. Convergence of the low-order solves can be further accelerated by applying off-the-shelf incomplete-factorization or algebraic-multigrid methods. Also available is an algorithm that uses a generalized minimum residual (GMRES) iterative method rather than SI for convergence, using a parallel sweep-based solver to build up a Krylov subspace. TSA has been applied as a preconditioner to accelerate the convergencemore » of the GMRES iterations. The methods are applied to several problems involving electron transport and problems with artificial cross sections with large scattering ratios. These methods were compared and evaluated by considering material discontinuities and scattering anisotropy. Observed accelerations obtained are highly problem dependent, but speedup factors around 10 have been observed in typical applications.« less
New matrix bounds and iterative algorithms for the discrete coupled algebraic Riccati equation

NASA Astrophysics Data System (ADS)

Liu, Jianzhou; Wang, Li; Zhang, Juan

2017-11-01

The discrete coupled algebraic Riccati equation (DCARE) has wide applications in control theory and linear system. In general, for the DCARE, one discusses every term of the coupled term, respectively. In this paper, we consider the coupled term as a whole, which is different from the recent results. When applying eigenvalue inequalities to discuss the coupled term, our method has less error. In terms of the properties of special matrices and eigenvalue inequalities, we propose several upper and lower matrix bounds for the solution of DCARE. Further, we discuss the iterative algorithms for the solution of the DCARE. In the fixed point iterative algorithms, the scope of Lipschitz factor is wider than the recent results. Finally, we offer corresponding numerical examples to illustrate the effectiveness of the derived results.
Error analysis applied to several inversion techniques used for the retrieval of middle atmospheric constituents from limb-scanning MM-wave spectroscopic measurements

NASA Technical Reports Server (NTRS)

Puliafito, E.; Bevilacqua, R.; Olivero, J.; Degenhardt, W.

1992-01-01

The formal retrieval error analysis of Rodgers (1990) allows the quantitative determination of such retrieval properties as measurement error sensitivity, resolution, and inversion bias. This technique was applied to five numerical inversion techniques and two nonlinear iterative techniques used for the retrieval of middle atmospheric constituent concentrations from limb-scanning millimeter-wave spectroscopic measurements. It is found that the iterative methods have better vertical resolution, but are slightly more sensitive to measurement error than constrained matrix methods. The iterative methods converge to the exact solution, whereas two of the matrix methods under consideration have an explicit constraint, the sensitivity of the solution to the a priori profile. Tradeoffs of these retrieval characteristics are presented.
Solving Differential Equations Using Modified Picard Iteration

ERIC Educational Resources Information Center

Robin, W. A.

2010-01-01

Many classes of differential equations are shown to be open to solution through a method involving a combination of a direct integration approach with suitably modified Picard iterative procedures. The classes of differential equations considered include typical initial value, boundary value and eigenvalue problems arising in physics and…
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.

HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.

NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods

PubMed Central

Smith, David S.; Gore, John C.; Yankeelov, Thomas E.; Welch, E. Brian

2012-01-01

Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images. PMID:22481908
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods.

PubMed

Smith, David S; Gore, John C; Yankeelov, Thomas E; Welch, E Brian

2012-01-01

Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 4096(2) or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 1024(2) and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images.
A Navier-Stokes solution of the three-dimensional viscous compressible flow in a centrifugal compressor impeller

NASA Technical Reports Server (NTRS)

Harp, J. L., Jr.

1977-01-01

A two-dimensional time-dependent computer code was utilized to calculate the three-dimensional steady flow within the impeller blading. The numerical method is an explicit time marching scheme in two spatial dimensions. Initially, an inviscid solution is generated on the hub blade-to-blade surface by the method of Katsanis and McNally (1973). Starting with the known inviscid solution, the viscous effects are calculated through iteration. The approach makes it possible to take into account principal impeller fluid-mechanical effects. It is pointed out that the second iterate provides a complete solution to the three-dimensional, compressible, Navier-Stokes equations for flow in a centrifugal impeller. The problems investigated are related to the study of a radial impeller and a backswept impeller.
Energy-Efficient Cognitive Radio Sensor Networks: Parametric and Convex Transformations

PubMed Central

Naeem, Muhammad; Illanko, Kandasamy; Karmokar, Ashok; Anpalagan, Alagan; Jaseemuddin, Muhammad

2013-01-01

Designing energy-efficient cognitive radio sensor networks is important to intelligently use battery energy and to maximize the sensor network life. In this paper, the problem of determining the power allocation that maximizes the energy-efficiency of cognitive radio-based wireless sensor networks is formed as a constrained optimization problem, where the objective function is the ratio of network throughput and the network power. The proposed constrained optimization problem belongs to a class of nonlinear fractional programming problems. Charnes-Cooper Transformation is used to transform the nonlinear fractional problem into an equivalent concave optimization problem. The structure of the power allocation policy for the transformed concave problem is found to be of a water-filling type. The problem is also transformed into a parametric form for which a ε-optimal iterative solution exists. The convergence of the iterative algorithms is proven, and numerical solutions are presented. The iterative solutions are compared with the optimal solution obtained from the transformed concave problem, and the effects of different system parameters (interference threshold level, the number of primary users and secondary sensor nodes) on the performance of the proposed algorithms are investigated. PMID:23966194
On the use of flux limiters in the discrete ordinates method for 3D radiation calculations in absorbing and scattering media

NASA Astrophysics Data System (ADS)

Godoy, William F.; DesJardin, Paul E.

2010-05-01

The application of flux limiters to the discrete ordinates method (DOM), SN, for radiative transfer calculations is discussed and analyzed for 3D enclosures for cases in which the intensities are strongly coupled to each other such as: radiative equilibrium and scattering media. A Newton-Krylov iterative method (GMRES) solves the final systems of linear equations along with a domain decomposition strategy for parallel computation using message passing libraries in a distributed memory system. Ray effects due to angular discretization and errors due to domain decomposition are minimized until small variations are introduced by these effects in order to focus on the influence of flux limiters on errors due to spatial discretization, known as numerical diffusion, smearing or false scattering. Results are presented for the DOM-integrated quantities such as heat flux, irradiation and emission. A variety of flux limiters are compared to "exact" solutions available in the literature, such as the integral solution of the RTE for pure absorbing-emitting media and isotropic scattering cases and a Monte Carlo solution for a forward scattering case. Additionally, a non-homogeneous 3D enclosure is included to extend the use of flux limiters to more practical cases. The overall balance of convergence, accuracy, speed and stability using flux limiters is shown to be superior compared to step schemes for any test case.
Segmentation of remotely sensed data using parallel region growing

NASA Technical Reports Server (NTRS)

Tilton, J. C.; Cox, S. C.

1983-01-01

The improved spatial resolution of the new earth resources satellites will increase the need for effective utilization of spatial information in machine processing of remotely sensed data. One promising technique is scene segmentation by region growing. Region growing can use spatial information in two ways: only spatially adjacent regions merge together, and merging criteria can be based on region-wide spatial features. A simple region growing approach is described in which the similarity criterion is based on region mean and variance (a simple spatial feature). An effective way to implement region growing for remote sensing is as an iterative parallel process on a large parallel processor. A straightforward parallel pixel-based implementation of the algorithm is explored and its efficiency is compared with sequential pixel-based, sequential region-based, and parallel region-based implementations. Experimental results from on aircraft scanner data set are presented, as is a discussioon of proposed improvements to the segmentation algorithm.
Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

2005-01-01

The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
User's Guide for ENSAERO_FE Parallel Finite Element Solver

NASA Technical Reports Server (NTRS)

Eldred, Lloyd B.; Guruswamy, Guru P.

1999-01-01

A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
Exact solitary wave solution for higher order nonlinear Schrodinger equation using He's variational iteration method

NASA Astrophysics Data System (ADS)

Rani, Monika; Bhatti, Harbax S.; Singh, Vikramjeet

2017-11-01

In optical communication, the behavior of the ultrashort pulses of optical solitons can be described through nonlinear Schrodinger equation. This partial differential equation is widely used to contemplate a number of physically important phenomena, including optical shock waves, laser and plasma physics, quantum mechanics, elastic media, etc. The exact analytical solution of (1+n)-dimensional higher order nonlinear Schrodinger equation by He's variational iteration method has been presented. Our proposed solutions are very helpful in studying the solitary wave phenomena and ensure rapid convergent series and avoid round off errors. Different examples with graphical representations have been given to justify the capability of the method.
An implicit-iterative solution of the heat conduction equation with a radiation boundary condition

NASA Technical Reports Server (NTRS)

Williams, S. D.; Curry, D. M.

1977-01-01

For the problem of predicting one-dimensional heat transfer between conducting and radiating mediums by an implicit finite difference method, four different formulations were used to approximate the surface radiation boundary condition while retaining an implicit formulation for the interior temperature nodes. These formulations are an explicit boundary condition, a linearized boundary condition, an iterative boundary condition, and a semi-iterative boundary method. The results of these methods in predicting surface temperature on the space shuttle orbiter thermal protection system model under a variety of heating rates were compared. The iterative technique caused the surface temperature to be bounded at each step. While the linearized and explicit methods were generally more efficient, the iterative and semi-iterative techniques provided a realistic surface temperature response without requiring step size control techniques.
Non-Cartesian Parallel Imaging Reconstruction

PubMed Central

Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

2014-01-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
ILUBCG2-11: Solution of 11-banded nonsymmetric linear equation systems by a preconditioned biconjugate gradient routine

NASA Astrophysics Data System (ADS)

Chen, Y.-M.; Koniges, A. E.; Anderson, D. V.

1989-10-01

The biconjugate gradient method (BCG) provides an attractive alternative to the usual conjugate gradient algorithms for the solution of sparse systems of linear equations with nonsymmetric and indefinite matrix operators. A preconditioned algorithm is given, whose form resembles the incomplete L-U conjugate gradient scheme (ILUCG2) previously presented. Although the BCG scheme requires the storage of two additional vectors, it converges in a significantly lesser number of iterations (often half), while the number of calculations per iteration remains essentially the same.
A system of nonlinear set valued variational inclusions.

PubMed

Tang, Yong-Kun; Chang, Shih-Sen; Salahuddin, Salahuddin

2014-01-01

In this paper, we studied the existence theorems and techniques for finding the solutions of a system of nonlinear set valued variational inclusions in Hilbert spaces. To overcome the difficulties, due to the presence of a proper convex lower semicontinuous function ϕ and a mapping g which appeared in the considered problems, we have used the resolvent operator technique to suggest an iterative algorithm to compute approximate solutions of the system of nonlinear set valued variational inclusions. The convergence of the iterative sequences generated by algorithm is also proved. 49J40; 47H06.
Techniques for Accelerating Iterative Methods for the Solution of Mathematical Problems

DTIC Science & Technology

1989-07-01

m, we can find a solu ion to the problem by using generalized inverses. Hence, ;= Ih.i = GAi = G - where G is of the form (18). A simple choice for V...have understood why I was not available for many of their activities and not home many of the nights. Their love is forever. I have saved the best for...Xk) Extrapolation applied to terms xP through Xk F Operator on x G Iteration function Ik Identity matrix of rank k Solution of the problem or the limit
Implementation on a nonlinear concrete cracking algorithm in NASTRAN

NASA Technical Reports Server (NTRS)

Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.; Chang, H.

1976-01-01

A computer code for the analysis of reinforced concrete structures was developed using NASTRAN as a basis. Nonlinear iteration procedures were developed for obtaining solutions with a wide variety of loading sequences. A direct access file system was used to save results at each load step to restart within the solution module for further analysis. A multi-nested looping capability was implemented to control the iterations and change the loads. The basis for the analysis is a set of mutli-layer plate elements which allow local definition of materials and cracking properties.
Applied Computational Electromagnetics Society Journal, Volume 7, Number 2, Winter 1992,

DTIC Science & Technology

1992-01-01

Rensselaer Polytechnic Institute University of Utah Las Cruces. NM USA -’roy. NY USA ,. Salt Lake City. UT T SA J.R. James A. Konrad D.V. Land RMCS Cranfleld...and found a strong correspondence between the two. Figures la -b show the equation and solution residuals as functions of iteration for small...I I I I I I I I I 4510 Is 31 25 16 35 .0 45 12 Figure la : Normalized solution and equation residuals as functions of iteration for a small sphere
Iterative methods used in overlap astrometric reduction techniques do not always converge

NASA Astrophysics Data System (ADS)

Rapaport, M.; Ducourant, C.; Colin, J.; Le Campion, J. F.

1993-04-01

In this paper we prove that the classical Gauss-Seidel type iterative methods used for the solution of the reduced normal equations occurring in overlapping reduction methods of astrometry do not always converge. We exhibit examples of divergence. We then analyze an alternative algorithm proposed by Wang (1985). We prove the consistency of this algorithm and verify that it can be convergent while the Gauss-Seidel method is divergent. We conjecture the convergence of Wang method for the solution of astrometric problems using overlap techniques.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

2017-01-28

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Nonnegative least-squares image deblurring: improved gradient projection approaches

NASA Astrophysics Data System (ADS)

Benvenuto, F.; Zanella, R.; Zanni, L.; Bertero, M.

2010-02-01

The least-squares approach to image deblurring leads to an ill-posed problem. The addition of the nonnegativity constraint, when appropriate, does not provide regularization, even if, as far as we know, a thorough investigation of the ill-posedness of the resulting constrained least-squares problem has still to be done. Iterative methods, converging to nonnegative least-squares solutions, have been proposed. Some of them have the 'semi-convergence' property, i.e. early stopping of the iteration provides 'regularized' solutions. In this paper we consider two of these methods: the projected Landweber (PL) method and the iterative image space reconstruction algorithm (ISRA). Even if they work well in many instances, they are not frequently used in practice because, in general, they require a large number of iterations before providing a sensible solution. Therefore, the main purpose of this paper is to refresh these methods by increasing their efficiency. Starting from the remark that PL and ISRA require only the computation of the gradient of the functional, we propose the application to these algorithms of special acceleration techniques that have been recently developed in the area of the gradient methods. In particular, we propose the application of efficient step-length selection rules and line-search strategies. Moreover, remarking that ISRA is a scaled gradient algorithm, we evaluate its behaviour in comparison with a recent scaled gradient projection (SGP) method for image deblurring. Numerical experiments demonstrate that the accelerated methods still exhibit the semi-convergence property, with a considerable gain both in the number of iterations and in the computational time; in particular, SGP appears definitely the most efficient one.
Virtual fringe projection system with nonparallel illumination based on iteration

NASA Astrophysics Data System (ADS)

Zhou, Duo; Wang, Zhangying; Gao, Nan; Zhang, Zonghua; Jiang, Xiangqian

2017-06-01

Fringe projection profilometry has been widely applied in many fields. To set up an ideal measuring system, a virtual fringe projection technique has been studied to assist in the design of hardware configurations. However, existing virtual fringe projection systems use parallel illumination and have a fixed optical framework. This paper presents a virtual fringe projection system with nonparallel illumination. Using an iterative method to calculate intersection points between rays and reference planes or object surfaces, the proposed system can simulate projected fringe patterns and captured images. A new explicit calibration method has been presented to validate the precision of the system. Simulated results indicate that the proposed iterative method outperforms previous systems. Our virtual system can be applied to error analysis, algorithm optimization, and help operators to find ideal system parameter settings for actual measurements.
Variable aperture-based ptychographical iterative engine method

NASA Astrophysics Data System (ADS)

Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

2018-02-01

A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches.
Kinematics and design of a class of parallel manipulators

NASA Astrophysics Data System (ADS)

Hertz, Roger Barry

1998-12-01

This dissertation is concerned with the kinematic analysis and design of a class of three degree-of-freedom, spatial parallel manipulators. The class of manipulators is characterized by two platforms, between which are three legs, each possessing a succession of revolute, spherical, and revolute joints. The class is termed the "revolute-spherical-revolute" class of parallel manipulators. Two members of this class are examined. The first mechanism is a double-octahedral variable-geometry truss, and the second is termed a double tripod. The history the mechanisms is explored---the variable-geometry truss dates back to 1984, while predecessors of the double tripod mechanism date back to 1869. This work centers on the displacement analysis of these three-degree-of-freedom mechanisms. Two types of problem are solved: the forward displacement analysis (forward kinematics) and the inverse displacement analysis (inverse kinematics). The kinematic model of the class of mechanism is general in nature. A classification scheme for the revolute-spherical-revolute class of mechanism is introduced, which uses dominant geometric features to group designs into 8 different sub-classes. The forward kinematics problem is discussed: given a set of independently controllable input variables, solve for the relative position and orientation between the two platforms. For the variable-geometry truss, the controllable input variables are assumed to be the linear (prismatic) joints. For the double tripod, the controllable input variables are the three revolute joints adjacent to the base (proximal) platform. Multiple solutions are presented to the forward kinematics problem, indicating that there are many different positions (assemblies) that the manipulator can assume with equivalent inputs. For the double tripod these solutions can be expressed as a 16th degree polynomial in one unknown, while for the variable-geometry truss there exist two 16th degree polynomials, giving rise to 256 solutions. For special cases of the double tripod, the forward kinematics problem is shown to have a closed-form solution. Numerical examples are presented for the solution to the forward kinematics. A double tripod is presented that admits 16 unique and real forward kinematics solutions. Another example for a variable geometry truss is given that possesses 64 real solutions: 8 for each 16th order polynomial. The inverse kinematics problem is also discussed: given the relative position of the hand (end-effector), which is rigidly attached to one platform, solve for the independently controlled joint variables. Iterative solutions are proposed for both the variable-geometry truss and the double tripod. For special cases of both mechanisms, closed-form solutions are given. The practical problems of designing, building, and controlling a double-tripod manipulator are addressed. The resulting manipulator is a first-of-its kind prototype of a tapered (asymmetric) double-tripod manipulator. Real-time forward and inverse kinematics algorithms on an industrial robot controller is presented. The resulting performance of the prototype is impressive, since it was to achieve a maximum tool-tip speed of 4064 mm/s, maximum acceleration of 5 g, and a cycle time of 1.2 seconds for a typical pick-and-place pattern.
Accurate numerical solution of the Helmholtz equation by iterative Lanczos reduction.

PubMed

Ratowsky, R P; Fleck, J A

1991-06-01

The Lanczos recursion algorithm is used to determine forward-propagating solutions for both the paraxial and Helmholtz wave equations for longitudinally invariant refractive indices. By eigenvalue analysis it is demonstrated that the method gives extremely accurate solutions to both equations.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

NASA Astrophysics Data System (ADS)

Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

2011-03-01

A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jing Yanfei, E-mail: yanfeijing@uestc.edu.c; Huang Tingzhu, E-mail: tzhuang@uestc.edu.c; Duan Yong, E-mail: duanyong@yahoo.c

This study is mainly focused on iterative solutions with simple diagonal preconditioning to two complex-valued nonsymmetric systems of linear equations arising from a computational chemistry model problem proposed by Sherry Li of NERSC. Numerical experiments show the feasibility of iterative methods to some extent when applied to the problems and reveal the competitiveness of our recently proposed Lanczos biconjugate A-orthonormalization methods to other classic and popular iterative methods. By the way, experiment results also indicate that application specific preconditioners may be mandatory and required for accelerating convergence.
Iterative methods for tomography problems: implementation to a cross-well tomography problem

NASA Astrophysics Data System (ADS)

Karadeniz, M. F.; Weber, G. W.

2018-01-01

The velocity distribution between two boreholes is reconstructed by cross-well tomography, which is commonly used in geology. In this paper, iterative methods, Kaczmarz’s algorithm, algebraic reconstruction technique (ART), and simultaneous iterative reconstruction technique (SIRT), are implemented to a specific cross-well tomography problem. Convergence to the solution of these methods and their CPU time for the cross-well tomography problem are compared. Furthermore, these three methods for this problem are compared for different tolerance values.
Iterative and multigrid methods in the finite element solution of incompressible and turbulent fluid flow

NASA Astrophysics Data System (ADS)

Lavery, N.; Taylor, C.

1999-07-01

Multigrid and iterative methods are used to reduce the solution time of the matrix equations which arise from the finite element (FE) discretisation of the time-independent equations of motion of the incompressible fluid in turbulent motion. Incompressible flow is solved by using the method of reduce interpolation for the pressure to satisfy the Brezzi-Babuska condition. The k-l model is used to complete the turbulence closure problem. The non-symmetric iterative matrix methods examined are the methods of least squares conjugate gradient (LSCG), biconjugate gradient (BCG), conjugate gradient squared (CGS), and the biconjugate gradient squared stabilised (BCGSTAB). The multigrid algorithm applied is based on the FAS algorithm of Brandt, and uses two and three levels of grids with a V-cycling schedule. These methods are all compared to the non-symmetric frontal solver. Copyright
Analysis of the iteratively regularized Gauss-Newton method under a heuristic rule

NASA Astrophysics Data System (ADS)

Jin, Qinian; Wang, Wei

2018-03-01

The iteratively regularized Gauss-Newton method is one of the most prominent regularization methods for solving nonlinear ill-posed inverse problems when the data is corrupted by noise. In order to produce a useful approximate solution, this iterative method should be terminated properly. The existing a priori and a posteriori stopping rules require accurate information on the noise level, which may not be available or reliable in practical applications. In this paper we propose a heuristic selection rule for this regularization method, which requires no information on the noise level. By imposing certain conditions on the noise, we derive a posteriori error estimates on the approximate solutions under various source conditions. Furthermore, we establish a convergence result without using any source condition. Numerical results are presented to illustrate the performance of our heuristic selection rule.
Vortex breakdown simulation

NASA Technical Reports Server (NTRS)

Hafez, M.; Ahmad, J.; Kuruvila, G.; Salas, M. D.

1987-01-01

In this paper, steady, axisymmetric inviscid, and viscous (laminar) swirling flows representing vortex breakdown phenomena are simulated using a stream function-vorticity-circulation formulation and two numerical methods. The first is based on an inverse iteration, where a norm of the solution is prescribed and the swirling parameter is calculated as a part of the output. The second is based on direct Newton iterations, where the linearized equations, for all the unknowns, are solved simultaneously by an efficient banded Gaussian elimination procedure. Several numerical solutions for inviscid and viscous flows are demonstrated, followed by a discussion of the results. Some improvements on previous work have been achieved: first order upwind differences are replaced by second order schemes, line relaxation procedure (with linear convergence rate) is replaced by Newton's iterations (which converge quadratically), and Reynolds numbers are extended from 200 up to 1000.
Optimization and validation of accelerated golden-angle radial sparse MRI reconstruction with self-calibrating GRAPPA operator gridding.

PubMed

Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li

2018-07-01

Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
An Iterative Method for Problems with Multiscale Conductivity

PubMed Central

Kim, Hyea Hyun; Minhas, Atul S.; Woo, Eung Je

2012-01-01

A model with its conductivity varying highly across a very thin layer will be considered. It is related to a stable phantom model, which is invented to generate a certain apparent conductivity inside a region surrounded by a thin cylinder with holes. The thin cylinder is an insulator and both inside and outside the thin cylinderare filled with the same saline. The injected current can enter only through the holes adopted to the thin cylinder. The model has a high contrast of conductivity discontinuity across the thin cylinder and the thickness of the layer and the size of holes are very small compared to the domain of the model problem. Numerical methods for such a model require a very fine mesh near the thin layer to resolve the conductivity discontinuity. In this work, an efficient numerical method for such a model problem is proposed by employing a uniform mesh, which need not resolve the conductivity discontinuity. The discrete problem is then solved by an iterative method, where the solution is improved by solving a simple discrete problem with a uniform conductivity. At each iteration, the right-hand side is updated by integrating the previous iterate over the thin cylinder. This process results in a certain smoothing effect on microscopic structures and our discrete model can provide a more practical tool for simulating the apparent conductivity. The convergence of the iterative method is analyzed regarding the contrast in the conductivity and the relative thickness of the layer. In numerical experiments, solutions of our method are compared to reference solutions obtained from COMSOL, where very fine meshes are used to resolve the conductivity discontinuity in the model. Errors of the voltage in L2 norm follow O(h) asymptotically and the current density matches quitewell those from the reference solution for a sufficiently small mesh size h. The experimental results present a promising feature of our approach for simulating the apparent conductivity related to changes in microscopic cellular structures. PMID:23304238
Analysis of surface wave propagation in a grounded dielectric slab covered by a resistive sheet

NASA Technical Reports Server (NTRS)

Shively, David G.

1992-01-01

Both parallel and perpendicular polarized surface waves are known to propagate on lossless and lossy grounded dielectric slabs. Surface wave propagation on a grounded dielectric slab covered with a resistive sheet is considered. Both parallel and perpendicular polarizations are examined. Transcendental equations are derived for each polarization and are solved using iterative techniques. Attenuation and phase velocity are shown for representative geometries. The results are applicable to both a grounded slab with a resistive sheet and an ungrounded slab covered on each side with a resistive sheet.
Development of an efficient multigrid method for the NEM form of the multigroup neutron diffusion equation

NASA Astrophysics Data System (ADS)

Al-Chalabi, Rifat M. Khalil

1997-09-01

Development of an improvement to the computational efficiency of the existing nested iterative solution strategy of the Nodal Exapansion Method (NEM) nodal based neutron diffusion code NESTLE is presented. The improvement in the solution strategy is the result of developing a multilevel acceleration scheme that does not suffer from the numerical stalling associated with a number of iterative solution methods. The acceleration scheme is based on the multigrid method, which is specifically adapted for incorporation into the NEM nonlinear iterative strategy. This scheme optimizes the computational interplay between the spatial discretization and the NEM nonlinear iterative solution process through the use of the multigrid method. The combination of the NEM nodal method, calculation of the homogenized, neutron nodal balance coefficients (i.e. restriction operator), efficient underlying smoothing algorithm (power method of NESTLE), and the finer mesh reconstruction algorithm (i.e. prolongation operator), all operating on a sequence of coarser spatial nodes, constitutes the multilevel acceleration scheme employed in this research. Two implementations of the multigrid method into the NESTLE code were examined; the Imbedded NEM Strategy and the Imbedded CMFD Strategy. The main difference in implementation between the two methods is that in the Imbedded NEM Strategy, the NEM solution is required at every MG level. Numerical tests have shown that the Imbedded NEM Strategy suffers from divergence at coarse- grid levels, hence all the results for the different benchmarks presented here were obtained using the Imbedded CMFD Strategy. The novelties in the developed MG method are as follows: the formulation of the restriction and prolongation operators, and the selection of the relaxation method. The restriction operator utilizes a variation of the reactor physics, consistent homogenization technique. The prolongation operator is based upon a variant of the pin power reconstruction methodology. The relaxation method, which is the power method, utilizes a constant coefficient matrix within the NEM non-linear iterative strategy. The choice of the MG nesting within the nested iterative strategy enables the incorporation of other non-linear effects with no additional coding effort. In addition, if an eigenvalue problem is being solved, it remains an eigenvalue problem at all grid levels, simplifying coding implementation. The merit of the developed MG method was tested by incorporating it into the NESTLE iterative solver, and employing it to solve four different benchmark problems. In addition to the base cases, three different sensitivity studies are performed, examining the effects of number of MG levels, homogenized coupling coefficients correction (i.e. restriction operator), and fine-mesh reconstruction algorithm (i.e. prolongation operator). The multilevel acceleration scheme developed in this research provides the foundation for developing adaptive multilevel acceleration methods for steady-state and transient NEM nodal neutron diffusion equations. (Abstract shortened by UMI.)
Iterative discrete ordinates solution of the equation for surface-reflected radiance

NASA Astrophysics Data System (ADS)

Radkevich, Alexander

2017-11-01

This paper presents a new method of numerical solution of the integral equation for the radiance reflected from an anisotropic surface. The equation relates the radiance at the surface level with BRDF and solutions of the standard radiative transfer problems for a slab with no reflection on its surfaces. It is also shown that the kernel of the equation satisfies the condition of the existence of a unique solution and the convergence of the successive approximations to that solution. The developed method features two basic steps: discretization on a 2D quadrature, and solving the resulting system of algebraic equations with successive over-relaxation method based on the Gauss-Seidel iterative process. Presented numerical examples show good coincidence between the surface-reflected radiance obtained with DISORT and the proposed method. Analysis of contributions of the direct and diffuse (but not yet reflected) parts of the downward radiance to the total solution is performed. Together, they represent a very good initial guess for the iterative process. This fact ensures fast convergence. The numerical evidence is given that the fastest convergence occurs with the relaxation parameter of 1 (no relaxation). An integral equation for BRDF is derived as inversion of the original equation. The potential of this new equation for BRDF retrievals is analyzed. The approach is found not viable as the BRDF equation appears to be an ill-posed problem, and it requires knowledge the surface-reflected radiance on the entire domain of both Sun and viewing zenith angles.
Sometimes "Newton's Method" Always "Cycles"

ERIC Educational Resources Information Center

Latulippe, Joe; Switkes, Jennifer

2012-01-01

Are there functions for which Newton's method cycles for all non-trivial initial guesses? We construct and solve a differential equation whose solution is a real-valued function that two-cycles under Newton iteration. Higher-order cycles of Newton's method iterates are explored in the complex plane using complex powers of "x." We find a class of…
Error Control Coding Techniques for Space and Satellite Communications

NASA Technical Reports Server (NTRS)

Costello, Daniel J., Jr.; Takeshita, Oscar Y.; Cabral, Hermano A.; He, Jiali; White, Gregory S.

1997-01-01

Turbo coding using iterative SOVA decoding and M-ary differentially coherent or non-coherent modulation can provide an effective coding modulation solution: (1) Energy efficient with relatively simple SOVA decoding and small packet lengths, depending on BEP required; (2) Low number of decoding iterations required; and (3) Robustness in fading with channel interleaving.
Developing DIII-D To Prepare For ITER And The Path To Fusion Energy

NASA Astrophysics Data System (ADS)

Buttery, Richard; Hill, David; Solomon, Wayne; Guo, Houyang; DIII-D Team

2017-10-01

DIII-D pursues the advancement of fusion energy through scientific understanding and discovery of solutions. Research targets two key goals. First, to prepare for ITER we must resolve how to use its flexible control tools to rapidly reach Q =10, and develop the scientific basis to interpret results from ITER for fusion projection. Second, we must determine how to sustain a high performance fusion core in steady state conditions, with minimal actuators and a plasma exhaust solution. DIII-D will target these missions with: (i) increased electron heating and balanced torque neutral beams to simulate burning plasma conditions (ii) new 3D coil arrays to resolve control of transients (iii) off axis current drive to study physics in steady state regimes (iv) divertors configurations to promote detachment with low upstream density (v) a reactor relevant wall to qualify materials and resolve physics in reactor-like conditions. With new diagnostics and leading edge simulation, this will position the US for success in ITER and a unique knowledge to accelerate the approach to fusion energy. Supported by the US DOE under DE-FC02-04ER54698.

A globally convergent Lagrange and barrier function iterative algorithm for the traveling salesman problem.

PubMed

Dang, C; Xu, L

2001-03-01

In this paper a globally convergent Lagrange and barrier function iterative algorithm is proposed for approximating a solution of the traveling salesman problem. The algorithm employs an entropy-type barrier function to deal with nonnegativity constraints and Lagrange multipliers to handle linear equality constraints, and attempts to produce a solution of high quality by generating a minimum point of a barrier problem for a sequence of descending values of the barrier parameter. For any given value of the barrier parameter, the algorithm searches for a minimum point of the barrier problem in a feasible descent direction, which has a desired property that the nonnegativity constraints are always satisfied automatically if the step length is a number between zero and one. At each iteration the feasible descent direction is found by updating Lagrange multipliers with a globally convergent iterative procedure. For any given value of the barrier parameter, the algorithm converges to a stationary point of the barrier problem without any condition on the objective function. Theoretical and numerical results show that the algorithm seems more effective and efficient than the softassign algorithm.
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.

2014-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Fourth-order numerical solutions of diffusion equation by using SOR method with Crank-Nicolson approach

NASA Astrophysics Data System (ADS)

Muhiddin, F. A.; Sulaiman, J.

2017-09-01

The aim of this paper is to investigate the effectiveness of the Successive Over-Relaxation (SOR) iterative method by using the fourth-order Crank-Nicolson (CN) discretization scheme to derive a five-point Crank-Nicolson approximation equation in order to solve diffusion equation. From this approximation equation, clearly, it can be shown that corresponding system of five-point approximation equations can be generated and then solved iteratively. In order to access the performance results of the proposed iterative method with the fourth-order CN scheme, another point iterative method which is Gauss-Seidel (GS), also presented as a reference method. Finally the numerical results obtained from the use of the fourth-order CN discretization scheme, it can be pointed out that the SOR iterative method is superior in terms of number of iterations, execution time, and maximum absolute error.
Spacecraft Attitude Maneuver Planning Using Genetic Algorithms

NASA Technical Reports Server (NTRS)

Kornfeld, Richard P.

2004-01-01

A key enabling technology that leads to greater spacecraft autonomy is the capability to autonomously and optimally slew the spacecraft from and to different attitudes while operating under a number of celestial and dynamic constraints. The task of finding an attitude trajectory that meets all the constraints is a formidable one, in particular for orbiting or fly-by spacecraft where the constraints and initial and final conditions are of time-varying nature. This approach for attitude path planning makes full use of a priori constraint knowledge and is computationally tractable enough to be executed onboard a spacecraft. The approach is based on incorporating the constraints into a cost function and using a Genetic Algorithm to iteratively search for and optimize the solution. This results in a directed random search that explores a large part of the solution space while maintaining the knowledge of good solutions from iteration to iteration. A solution obtained this way may be used as is or as an initial solution to initialize additional deterministic optimization algorithms. A number of representative case examples for time-fixed and time-varying conditions yielded search times that are typically on the order of minutes, thus demonstrating the viability of this method. This approach is applicable to all deep space and planet Earth missions requiring greater spacecraft autonomy, and greatly facilitates navigation and science observation planning.
Assessment of Preconditioner for a USM3D Hierarchical Adaptive Nonlinear Method (HANIM) (Invited)

NASA Technical Reports Server (NTRS)

Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.

2016-01-01

Enhancements to the previously reported mixed-element USM3D Hierarchical Adaptive Nonlinear Iteration Method (HANIM) framework have been made to further improve robustness, efficiency, and accuracy of computational fluid dynamic simulations. The key enhancements include a multi-color line-implicit preconditioner, a discretely consistent symmetry boundary condition, and a line-mapping method for the turbulence source term discretization. The USM3D iterative convergence for the turbulent flows is assessed on four configurations. The configurations include a two-dimensional (2D) bump-in-channel, the 2D NACA 0012 airfoil, a three-dimensional (3D) bump-in-channel, and a 3D hemisphere cylinder. The Reynolds Averaged Navier Stokes (RANS) solutions have been obtained using a Spalart-Allmaras turbulence model and families of uniformly refined nested grids. Two types of HANIM solutions using line- and point-implicit preconditioners have been computed. Additional solutions using the point-implicit preconditioner alone (PA) method that broadly represents the baseline solver technology have also been computed. The line-implicit HANIM shows superior iterative convergence in most cases with progressively increasing benefits on finer grids.
Laser simulation applying Fox-Li iteration: investigation of reason for non-convergence

NASA Astrophysics Data System (ADS)

Paxton, Alan H.; Yang, Chi

2017-02-01

Fox-Li iteration is often used to numerically simulate lasers. If a solution is found, the complex field amplitude is a good indication of the laser mode. The case of a semiconductor laser, for which the medium possesses a self-focusing nonlinearity, was investigated. For a case of interest, the iterations did not yield a converged solution. Another approach was needed to explore the properties of the laser mode. The laser was treated (unphysically) as a regenerative amplifier. As the input to the amplifier, we required a smooth complex field distribution that matched the laser resonator. To obtain such a field, we found what would be the solution for the laser field if the strength of the self focusing nonlinearity were α = 0. This was used as the input to the laser, treated as an amplifier. Because the beam deteriorated as it propagated multiple passes in the resonator and through the gain medium (for α = 2.7), we concluded that a mode with good beam quality could not exist in the laser.
Fast solution of elliptic partial differential equations using linear combinations of plane waves.

PubMed

Pérez-Jordá, José M

2016-02-01

Given an arbitrary elliptic partial differential equation (PDE), a procedure for obtaining its solution is proposed based on the method of Ritz: the solution is written as a linear combination of plane waves and the coefficients are obtained by variational minimization. The PDE to be solved is cast as a system of linear equations Ax=b, where the matrix A is not sparse, which prevents the straightforward application of standard iterative methods in order to solve it. This sparseness problem can be circumvented by means of a recursive bisection approach based on the fast Fourier transform, which makes it possible to implement fast versions of some stationary iterative methods (such as Gauss-Seidel) consuming O(NlogN) memory and executing an iteration in O(Nlog(2)N) time, N being the number of plane waves used. In a similar way, fast versions of Krylov subspace methods and multigrid methods can also be implemented. These procedures are tested on Poisson's equation expressed in adaptive coordinates. It is found that the best results are obtained with the GMRES method using a multigrid preconditioner with Gauss-Seidel relaxation steps.
On the implementation of an accurate and efficient solver for convection-diffusion equations

NASA Astrophysics Data System (ADS)

Wu, Chin-Tien

In this dissertation, we examine several different aspects of computing the numerical solution of the convection-diffusion equation. The solution of this equation often exhibits sharp gradients due to Dirichlet outflow boundaries or discontinuities in boundary conditions. Because of the singular-perturbed nature of the equation, numerical solutions often have severe oscillations when grid sizes are not small enough to resolve sharp gradients. To overcome such difficulties, the streamline diffusion discretization method can be used to obtain an accurate approximate solution in regions where the solution is smooth. To increase accuracy of the solution in the regions containing layers, adaptive mesh refinement and mesh movement based on a posteriori error estimations can be employed. An error-adapted mesh refinement strategy based on a posteriori error estimations is also proposed to resolve layers. For solving the sparse linear systems that arise from discretization, goemetric multigrid (MG) and algebraic multigrid (AMG) are compared. In addition, both methods are also used as preconditioners for Krylov subspace methods. We derive some convergence results for MG with line Gauss-Seidel smoothers and bilinear interpolation. Finally, while considering adaptive mesh refinement as an integral part of the solution process, it is natural to set a stopping tolerance for the iterative linear solvers on each mesh stage so that the difference between the approximate solution obtained from iterative methods and the finite element solution is bounded by an a posteriori error bound. Here, we present two stopping criteria. The first is based on a residual-type a posteriori error estimator developed by Verfurth. The second is based on an a posteriori error estimator, using local solutions, developed by Kay and Silvester. Our numerical results show the refined mesh obtained from the iterative solution which satisfies the second criteria is similar to the refined mesh obtained from the finite element solution.
Progress in Development of the ITER Plasma Control System Simulation Platform

NASA Astrophysics Data System (ADS)

Walker, Michael; Humphreys, David; Sammuli, Brian; Ambrosino, Giuseppe; de Tommasi, Gianmaria; Mattei, Massimiliano; Raupp, Gerhard; Treutterer, Wolfgang; Winter, Axel

2017-10-01

We report on progress made and expected uses of the Plasma Control System Simulation Platform (PCSSP), the primary test environment for development of the ITER Plasma Control System (PCS). PCSSP will be used for verification and validation of the ITER PCS Final Design for First Plasma, to be completed in 2020. We discuss the objectives of PCSSP, its overall structure, selected features, application to existing devices, and expected evolution over the lifetime of the ITER PCS. We describe an archiving solution for simulation results, methods for incorporating physics models of the plasma and physical plant (tokamak, actuator, and diagnostic systems) into PCSSP, and defining characteristics of models suitable for a plasma control development environment such as PCSSP. Applications of PCSSP simulation models including resistive plasma equilibrium evolution are demonstrated. PCSSP development supported by ITER Organization under ITER/CTS/6000000037. Resistive evolution code developed under General Atomics' Internal funding. The views and opinions expressed herein do not necessarily reflect those of the ITER Organization.
Cleaning of first mirrors in ITER by means of radio frequency discharges.

PubMed

Leipold, F; Reichle, R; Vorpahl, C; Mukhin, E E; Dmitriev, A M; Razdobarin, A G; Samsonov, D S; Marot, L; Moser, L; Steiner, R; Meyer, E

2016-11-01

First mirrors of optical diagnostics in ITER are subject to charge exchange fluxes of Be, W, and potentially other elements. This may degrade the optical performance significantly via erosion or deposition. In order to restore reflectivity, cleaning by applying radio frequency (RF) power to the mirror itself and thus creating a discharge in front of the mirror will be used. The plasma generated in front of the mirror surface sputters off deposition, restoring its reflectivity. Although the functionality of such a mirror cleaning technique is proven in laboratory experiments, the technical implementation in ITER revealed obstacles which needs to be overcome: Since the discharge as an RF load in general is not very well matched to the power generator and transmission line, power reflections will occur leading to a thermal load of the cable. Its implementation for ITER requires additional R&D. This includes the design of mirrors as RF electrodes, as well as feeders and matching networks inside the vacuum vessel. Mitigation solutions will be evaluated and discussed. Furthermore, technical obstacles (i.e., cooling water pipes for the mirrors) need to be solved. Since cooling water lines are usually on ground potential at the feed through of the vacuum vessel, a solution to decouple the ground potential from the mirror would be a major simplification. Such a solution will be presented.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

NASA Technical Reports Server (NTRS)

Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

1996-01-01

We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Modules and methods for all photonic computing

DOEpatents

Schultz, David R.; Ma, Chao Hung

2001-01-01

A method for all photonic computing, comprising the steps of: encoding a first optical/electro-optical element with a two dimensional mathematical function representing input data; illuminating the first optical/electro-optical element with a collimated beam of light; illuminating a second optical/electro-optical element with light from the first optical/electro-optical element, the second optical/electro-optical element having a characteristic response corresponding to an iterative algorithm useful for solving a partial differential equation; iteratively recirculating the signal through the second optical/electro-optical element with light from the second optical/electro-optical element for a predetermined number of iterations; and, after the predetermined number of iterations, optically and/or electro-optically collecting output data representing an iterative optical solution from the second optical/electro-optical element.
TH-AB-BRA-09: Stability Analysis of a Novel Dose Calculation Algorithm for MRI Guided Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zelyak, O; Fallone, B; Cross Cancer Institute, Edmonton, AB

2016-06-15

Purpose: To determine the iterative deterministic solution stability of the Linear Boltzmann Transport Equation (LBTE) in the presence of magnetic fields. Methods: The LBTE with magnetic fields under investigation is derived using a discrete ordinates approach. The stability analysis is performed using analytical and numerical methods. Analytically, the spectral Fourier analysis is used to obtain the convergence rate of the source iteration procedures based on finding the largest eigenvalue of the iterative operator. This eigenvalue is a function of relevant physical parameters, such as magnetic field strength and material properties, and provides essential information about the domain of applicability requiredmore » for clinically optimal parameter selection and maximum speed of convergence. The analytical results are reinforced by numerical simulations performed using the same discrete ordinates method in angle, and a discontinuous finite element spatial approach. Results: The spectral radius for the source iteration technique of the time independent transport equation with isotropic and anisotropic scattering centers inside infinite 3D medium is equal to the ratio of differential and total cross sections. The result is confirmed numerically by solving LBTE and is in full agreement with previously published results. The addition of magnetic field reveals that the convergence becomes dependent on the strength of magnetic field, the energy group discretization, and the order of anisotropic expansion. Conclusion: The source iteration technique for solving the LBTE with magnetic fields with the discrete ordinates method leads to divergent solutions in the limiting cases of small energy discretizations and high magnetic field strengths. Future investigations into non-stationary Krylov subspace techniques as an iterative solver will be performed as this has been shown to produce greater stability than source iteration. Furthermore, a stability analysis of a discontinuous finite element space-angle approach (which has been shown to provide the greatest stability) will also be investigated. Dr. B Gino Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization)« less
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Multigrid techniques for nonlinear eigenvalue probems: Solutions of a nonlinear Schroedinger eigenvalue problem in 2D and 3D

NASA Technical Reports Server (NTRS)

Costiner, Sorin; Taasan, Shlomo

1994-01-01

This paper presents multigrid (MG) techniques for nonlinear eigenvalue problems (EP) and emphasizes an MG algorithm for a nonlinear Schrodinger EP. The algorithm overcomes the mentioned difficulties combining the following techniques: an MG projection coupled with backrotations for separation of solutions and treatment of difficulties related to clusters of close and equal eigenvalues; MG subspace continuation techniques for treatment of the nonlinearity; an MG simultaneous treatment of the eigenvectors at the same time with the nonlinearity and with the global constraints. The simultaneous MG techniques reduce the large number of self consistent iterations to only a few or one MG simultaneous iteration and keep the solutions in a right neighborhood where the algorithm converges fast.
Some new results on the central overlap problem in astrometry

NASA Astrophysics Data System (ADS)

Rapaport, M.

1998-07-01

The central overlap problem in astrometry has been revisited in the recent last years by Eichhorn (1988) who explicitly inverted the matrix of a constrained least squares problem. In this paper, the general explicit solution of the unconstrained central overlap problem is given. We also give the explicit solution for an other set of constraints; this result is a confirmation of a conjecture expressed by Eichhorn (1988). We also consider the use of iterative methods to solve the central overlap problem. A surprising result is obtained when the classical Gauss Seidel method is used; the iterations converge immediately to the general solution of the equations; we explain this property writing the central overlap problem in a new set of variables.
Preconditioned conjugate residual methods for the solution of spectral equations

NASA Technical Reports Server (NTRS)

Wong, Y. S.; Zang, T. A.; Hussaini, M. Y.

1986-01-01

Conjugate residual methods for the solution of spectral equations are described. An inexact finite-difference operator is introduced as a preconditioner in the iterative procedures. Application of these techniques is limited to problems for which the symmetric part of the coefficient matrix is positive definite. Although the spectral equation is a very ill-conditioned and full matrix problem, the computational effort of the present iterative methods for solving such a system is comparable to that for the sparse matrix equations obtained from the application of either finite-difference or finite-element methods to the same problems. Numerical experiments are shown for a self-adjoint elliptic partial differential equation with Dirichlet boundary conditions, and comparison with other solution procedures for spectral equations is presented.
Parameterizations for ensemble Kalman inversion

NASA Astrophysics Data System (ADS)

Chada, Neil K.; Iglesias, Marco A.; Roininen, Lassi; Stuart, Andrew M.

2018-05-01

The use of ensemble methods to solve inverse problems is attractive because it is a derivative-free methodology which is also well-adapted to parallelization. In its basic iterative form the method produces an ensemble of solutions which lie in the linear span of the initial ensemble. Choice of the parameterization of the unknown field is thus a key component of the success of the method. We demonstrate how both geometric ideas and hierarchical ideas can be used to design effective parameterizations for a number of applied inverse problems arising in electrical impedance tomography, groundwater flow and source inversion. In particular we show how geometric ideas, including the level set method, can be used to reconstruct piecewise continuous fields, and we show how hierarchical methods can be used to learn key parameters in continuous fields, such as length-scales, resulting in improved reconstructions. Geometric and hierarchical ideas are combined in the level set method to find piecewise constant reconstructions with interfaces of unknown topology.
Computational methods for coupling microstructural and micromechanical materials response simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

HOLM,ELIZABETH A.; BATTAILE,CORBETT C.; BUCHHEIT,THOMAS E.

2000-04-01

Computational materials simulations have traditionally focused on individual phenomena: grain growth, crack propagation, plastic flow, etc. However, real materials behavior results from a complex interplay between phenomena. In this project, the authors explored methods for coupling mesoscale simulations of microstructural evolution and micromechanical response. In one case, massively parallel (MP) simulations for grain evolution and microcracking in alumina stronglink materials were dynamically coupled. In the other, codes for domain coarsening and plastic deformation in CuSi braze alloys were iteratively linked. this program provided the first comparison of two promising ways to integrate mesoscale computer codes. Coupled microstructural/micromechanical codes were appliedmore » to experimentally observed microstructures for the first time. In addition to the coupled codes, this project developed a suite of new computational capabilities (PARGRAIN, GLAD, OOF, MPM, polycrystal plasticity, front tracking). The problem of plasticity length scale in continuum calculations was recognized and a solution strategy was developed. The simulations were experimentally validated on stockpile materials.« less
Analysis of a closed-kinematic chain robot manipulator

NASA Technical Reports Server (NTRS)

Nguyen, Charles C.; Pooran, Farhad J.

1988-01-01

Presented are the research results from the research grant entitled: Active Control of Robot Manipulators, sponsored by the Goddard Space Flight Center (NASA) under grant number NAG-780. This report considers a class of robot manipulators based on the closed-kinematic chain mechanism (CKCM). This type of robot manipulators mainly consists of two platforms, one is stationary and the other moving, and they are coupled together through a number of in-parallel actuators. Using spatial geometry and homogeneous transformation, a closed-form solution is derived for the inverse kinematic problem of the six-degree-of-freedom manipulator, built to study robotic assembly in space. Iterative Newton Raphson method is employed to solve the forward kinematic problem. Finally, the equations of motion of the above manipulators are obtained by employing the Lagrangian method. Study of the manipulator dynamics is performed using computer simulation whose results show that the robot actuating forces are strongly dependent on the mass and centroid locations of the robot links.

Implementation of an effective hybrid GA for large-scale traveling salesman problems.

PubMed

Nguyen, Hung Dinh; Yoshihara, Ikuo; Yamamori, Kunihito; Yasunaga, Moritoshi

2007-02-01

This correspondence describes a hybrid genetic algorithm (GA) to find high-quality solutions for the traveling salesman problem (TSP). The proposed method is based on a parallel implementation of a multipopulation steady-state GA involving local search heuristics. It uses a variant of the maximal preservative crossover and the double-bridge move mutation. An effective implementation of the Lin-Kernighan heuristic (LK) is incorporated into the method to compensate for the GA's lack of local search ability. The method is validated by comparing it with the LK-Helsgaun method (LKH), which is one of the most effective methods for the TSP. Experimental results with benchmarks having up to 316228 cities show that the proposed method works more effectively and efficiently than LKH when solving large-scale problems. Finally, the method is used together with the implementation of the iterated LK to find a new best tour (as of June 2, 2003) for a 1904711-city TSP challenge.
Development and Application of the Collaborative Optimization Architecture in a Multidisciplinary Design Environment

NASA Technical Reports Server (NTRS)

Braun, R. D.; Kroo, I. M.

1995-01-01

Collaborative optimization is a design architecture applicable in any multidisciplinary analysis environment but specifically intended for large-scale distributed analysis applications. In this approach, a complex problem is hierarchically de- composed along disciplinary boundaries into a number of subproblems which are brought into multidisciplinary agreement by a system-level coordination process. When applied to problems in a multidisciplinary design environment, this scheme has several advantages over traditional solution strategies. These advantageous features include reducing the amount of information transferred between disciplines, the removal of large iteration-loops, allowing the use of different subspace optimizers among the various analysis groups, an analysis framework which is easily parallelized and can operate on heterogenous equipment, and a structural framework that is well-suited for conventional disciplinary organizations. In this article, the collaborative architecture is developed and its mathematical foundation is presented. An example application is also presented which highlights the potential of this method for use in large-scale design applications.
Chemical transport in a fissured rock: Verification of a numerical model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rasmuson, A.; Narasimhan, T. N.; Neretnieks, I.

1982-10-01

Numerical models for simulating chemical transport in fissured rocks constitute powerful tools for evaluating the acceptability of geological nuclear waste repositories. Due to the very long-term, high toxicity of some nuclear waste products, the models are required to predict, in certain cases, the spatial and temporal distribution of chemical concentration less than 0.001% of the concentration released from the repository. Whether numerical models can provide such accuracies is a major question addressed in the present work. To this end, we have verified a numerical model, TRUMP, which solves the advective diffusion equation in general three dimensions with or without decaymore » and source terms. The method is based on an integrated finite-difference approach. The model was verified against known analytic solution of the one-dimensional advection-diffusion problem as well as the problem of advection-diffusion in a system of parallel fractures separated by spherical particles. The studies show that as long as the magnitude of advectance is equal to or less than that of conductance for the closed surface bounding any volume element in the region (that is, numerical Peclet number <2), the numerical method can indeed match the analytic solution within errors of ±10{sup -3} % or less. The realistic input parameters used in the sample calculations suggest that such a range of Peclet numbers is indeed likely to characterize deep groundwater systems in granitic and ancient argillaceous systems. Thus TRUMP in its present form does provide a viable tool for use in nuclear waste evaluation studies. A sensitivity analysis based on the analytic solution suggests that the errors in prediction introduced due to uncertainties in input parameters is likely to be larger than the computational inaccuracies introduced by the numerical model. Currently, a disadvantage in the TRUMP model is that the iterative method of solving the set of simultaneous equations is rather slow when time constants vary widely over the flow region. Although the iterative solution may be very desirable for large three-dimensional problems in order to minimize computer storage, it seems desirable to use a direct solver technique in conjunction with the mixed explicit-implicit approach whenever possible. work in this direction is in progress.« less
Non-oscillatory and non-diffusive solution of convection problems by the iteratively reweighted least-squares finite element method

NASA Technical Reports Server (NTRS)

Jiang, Bo-Nan

1993-01-01

A comparative description is presented for the least-squares FEM (LSFEM) for 2D steady-state pure convection problems. In addition to exhibiting better control of the streamline derivative than the streamline upwinding Petrov-Galerkin method, numerical convergence rates are obtained which show the LSFEM to be virtually optimal. The LSFEM is used as a framework for an iteratively reweighted LSFEM yielding nonoscillatory and nondiffusive solutions for problems with contact discontinuities; this method is shown to convect contact discontinuities without error when using triangular and bilinear elements.
Implementing the Gaia Astrometric Global Iterative Solution (AGIS) in Java

NASA Astrophysics Data System (ADS)

O'Mullane, William; Lammers, Uwe; Lindegren, Lennart; Hernandez, Jose; Hobbs, David

2011-10-01

This paper provides a description of the Java software framework which has been constructed to run the Astrometric Global Iterative Solution for the Gaia mission. This is the mathematical framework to provide the rigid reference frame for Gaia observations from the Gaia data itself. This process makes Gaia a self calibrated, and input catalogue independent, mission. The framework is highly distributed typically running on a cluster of machines with a database back end. All code is written in the Java language. We describe the overall architecture and some of the details of the implementation.
Capture zones for simple aquifers

USGS Publications Warehouse

McElwee, Carl D.

1991-01-01

Capture zones showing the area influenced by a well within a certain time are useful for both aquifer protection and cleanup. If hydrodynamic dispersion is neglected, a deterministic curve defines the capture zone. Analytical expressions for the capture zones can be derived for simple aquifers. However, the capture zone equations are transcendental and cannot be explicitly solved for the coordinates of the capture zone boundary. Fortunately, an iterative scheme allows the solution to proceed quickly and efficiently even on a modest personal computer. Three forms of the analytical solution must be used in an iterative scheme to cover the entire region of interest, after the extreme values of the x coordinate are determined by an iterative solution. The resulting solution is a discrete one, and usually 100-1000 intervals along the x-axis are necessary for a smooth definition of the capture zone. The presented program is written in FORTRAN and has been used in a variety of computing environments. No graphics capability is included with the program; it is assumed the user has access to a commercial package. The superposition of capture zones for multiple wells is expected to be satisfactory if the spacing is not too close. Because this program deals with simple aquifers, the results rarely will be the final word in a real application.
A Fourier-based compressed sensing technique for accelerated CT image reconstruction using first-order methods.

PubMed

Choi, Kihwan; Li, Ruijiang; Nam, Haewon; Xing, Lei

2014-06-21

As a solution to iterative CT image reconstruction, first-order methods are prominent for the large-scale capability and the fast convergence rate [Formula: see text]. In practice, the CT system matrix with a large condition number may lead to slow convergence speed despite the theoretically promising upper bound. The aim of this study is to develop a Fourier-based scaling technique to enhance the convergence speed of first-order methods applied to CT image reconstruction. Instead of working in the projection domain, we transform the projection data and construct a data fidelity model in Fourier space. Inspired by the filtered backprojection formalism, the data are appropriately weighted in Fourier space. We formulate an optimization problem based on weighted least-squares in the Fourier space and total-variation (TV) regularization in image space for parallel-beam, fan-beam and cone-beam CT geometry. To achieve the maximum computational speed, the optimization problem is solved using a fast iterative shrinkage-thresholding algorithm with backtracking line search and GPU implementation of projection/backprojection. The performance of the proposed algorithm is demonstrated through a series of digital simulation and experimental phantom studies. The results are compared with the existing TV regularized techniques based on statistics-based weighted least-squares as well as basic algebraic reconstruction technique. The proposed Fourier-based compressed sensing (CS) method significantly improves both the image quality and the convergence rate compared to the existing CS techniques.
Variable aperture-based ptychographical iterative engine method.

PubMed

Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

2018-02-01

A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
On iterative processes in the Krylov-Sonneveld subspaces

NASA Astrophysics Data System (ADS)

Ilin, Valery P.

2016-10-01

The iterative Induced Dimension Reduction (IDR) methods are considered for solving large systems of linear algebraic equations (SLAEs) with nonsingular nonsymmetric matrices. These approaches are investigated by many authors and are charachterized sometimes as the alternative to the classical processes of Krylov type. The key moments of the IDR algorithms consist in the construction of the embedded Sonneveld subspaces, which have the decreasing dimensions and use the orthogonalization to some fixed subspace. Other independent approaches for research and optimization of the iterations are based on the augmented and modified Krylov subspaces by using the aggregation and deflation procedures with present various low rank approximations of the original matrices. The goal of this paper is to show, that IDR method in Sonneveld subspaces present an original interpretation of the modified algorithms in the Krylov subspaces. In particular, such description is given for the multi-preconditioned semi-conjugate direction methods which are actual for the parallel algebraic domain decomposition approaches.
Iterated reaction graphs: simulating complex Maillard reaction pathways.

PubMed

Patel, S; Rabone, J; Russell, S; Tissen, J; Klaffke, W

2001-01-01

This study investigates a new method of simulating a complex chemical system including feedback loops and parallel reactions. The practical purpose of this approach is to model the actual reactions that take place in the Maillard process, a set of food browning reactions, in sufficient detail to be able to predict the volatile composition of the Maillard products. The developed framework, called iterated reaction graphs, consists of two main elements: a soup of molecules and a reaction base of Maillard reactions. An iterative process loops through the reaction base, taking reactants from and feeding products back to the soup. This produces a reaction graph, with molecules as nodes and reactions as arcs. The iterated reaction graph is updated and validated by comparing output with the main products found by classical gas-chromatographic/mass spectrometric analysis. To ensure a realistic output and convergence to desired volatiles only, the approach contains a number of novel elements: rate kinetics are treated as reaction probabilities; only a subset of the true chemistry is modeled; and the reactions are blocked into groups.
Subsonic panel method for designing wing surfaces from pressure distribution

NASA Technical Reports Server (NTRS)

Bristow, D. R.; Hawk, J. D.

1983-01-01

An iterative method has been developed for designing wing section contours corresponding to a prescribed subcritical distribution of pressure. The calculations are initialized by using a surface panel method to analyze a baseline wing or wing-fuselage configuration. A first-order expansion to the baseline panel method equations is then used to calculate a matrix containing the partial derivative of potential at each control point with respect to each unknown geometry parameter. In every iteration cycle, the matrix is used both to calculate the geometry perturbation and to analyze the perturbed geometry. The distribution of potential on the perturbed geometry is established by simple linear extrapolation from the baseline solution. The extrapolated potential is converted to pressure by Bernoulli's equation. Not only is the accuracy of the approach good for very large perturbations, but the computing cost of each complete iteration cycle is substantially less than one analysis solution by a conventional panel method.
Study on monostable and bistable reaction-diffusion equations by iteration of travelling wave maps

NASA Astrophysics Data System (ADS)

Yi, Taishan; Chen, Yuming

2017-12-01

In this paper, based on the iterative properties of travelling wave maps, we develop a new method to obtain spreading speeds and asymptotic propagation for monostable and bistable reaction-diffusion equations. Precisely, for Dirichlet problems of monostable reaction-diffusion equations on the half line, by making links between travelling wave maps and integral operators associated with the Dirichlet diffusion kernel (the latter is NOT invariant under translation), we obtain some iteration properties of the Dirichlet diffusion and some a priori estimates on nontrivial solutions of Dirichlet problems under travelling wave transformation. We then provide the asymptotic behavior of nontrivial solutions in the space-time region for Dirichlet problems. These enable us to develop a unified method to obtain results on heterogeneous steady states, travelling waves, spreading speeds, and asymptotic spreading behavior for Dirichlet problem of monostable reaction-diffusion equations on R+ as well as of monostable/bistable reaction-diffusion equations on R.
Auxiliary principle technique and iterative algorithm for a perturbed system of generalized multi-valued mixed quasi-equilibrium-like problems.

PubMed

Rahaman, Mijanur; Pang, Chin-Tzong; Ishtyak, Mohd; Ahmad, Rais

2017-01-01

In this article, we introduce a perturbed system of generalized mixed quasi-equilibrium-like problems involving multi-valued mappings in Hilbert spaces. To calculate the approximate solutions of the perturbed system of generalized multi-valued mixed quasi-equilibrium-like problems, firstly we develop a perturbed system of auxiliary generalized multi-valued mixed quasi-equilibrium-like problems, and then by using the celebrated Fan-KKM technique, we establish the existence and uniqueness of solutions of the perturbed system of auxiliary generalized multi-valued mixed quasi-equilibrium-like problems. By deploying an auxiliary principle technique and an existence result, we formulate an iterative algorithm for solving the perturbed system of generalized multi-valued mixed quasi-equilibrium-like problems. Lastly, we study the strong convergence analysis of the proposed iterative sequences under monotonicity and some mild conditions. These results are new and generalize some known results in this field.
A preconditioner for the finite element computation of incompressible, nonlinear elastic deformations

NASA Astrophysics Data System (ADS)

Whiteley, J. P.

2017-10-01

Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
DIII-D accomplishments and plans in support of fusion next steps

DOE PAGES

Buttery, R. J; Eidietis, N.; Holcomb, C.; ...

2013-06-01

DIII-D is using its flexibility and diagnostics to address the critical science required to enable next step fusion devices. We have adapted operating scenarios for ITER to low torque and are now being optimized for transport. Three ELM mitigation scenarios have been developed to near-ITER parameters. New control techniques are managing the most challenging plasma instabilities. Disruption mitigation tools show promising dissipation strategies for runaway electrons and heat load. An off axis neutral beam upgrade has enabled sustainment of high βN capable steady state regimes. Divertor research is identifying the challenge, physics and candidate solutions for handling the hot plasmamore » exhaust with notable progress in heat flux reduction using the snowflake configuration. Our work is helping optimize design choices and prepare the scientific tools for operation in ITER, and resolve key elements of the plasma configuration and divertor solution for an FNSF.« less
Incorporating Prototyping and Iteration into Intervention Development: A Case Study of a Dining Hall-Based Intervention

ERIC Educational Resources Information Center

McClain, Arianna D.; Hekler, Eric B.; Gardner, Christopher D.

2013-01-01

Background: Previous research from the fields of computer science and engineering highlight the importance of an iterative design process (IDP) to create more creative and effective solutions. Objective: This study describes IDP as a new method for developing health behavior interventions and evaluates the effectiveness of a dining hall--based…
US NDC Modernization Iteration E1 Prototyping Report: User Interface Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lober, Randall R.

2014-12-01

During the first iteration of the US NDC Modernization Elaboration phase (E1), the SNL US NDC modernization project team completed an initial survey of applicable COTS solutions, and established exploratory prototyping related to the User Interface Framework (UIF) in support of system architecture definition. This report summarizes these activities and discusses planned follow-on work.
US NDC Modernization Iteration E1 Prototyping Report: Common Object Interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lewis, Jennifer E.; Hess, Michael M.

2014-12-01

During the first iteration of the US NDC Modernization Elaboration phase (E1), the SNL US NDC modernization project team completed an initial survey of applicable COTS solutions, and established exploratory prototyping related to the Common Object Interface (COI) in support of system architecture definition. This report summarizes these activities and discusses planned follow-on work.
US NDC Modernization Iteration E1 Prototyping Report: Processing Control Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prescott, Ryan; Hamlet, Benjamin R.

2014-12-01

During the first iteration of the US NDC Modernization Elaboration phase (E1), the SNL US NDC modernization project team developed an initial survey of applicable COTS solutions, and established exploratory prototyping related to the processing control framework in support of system architecture definition. This report summarizes these activities and discusses planned follow-on work.
Parallelization of a blind deconvolution algorithm

NASA Astrophysics Data System (ADS)

Matson, Charles L.; Borelli, Kathy J.

2006-09-01

Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.