Sample records for iterative parallel synthesis

  1. Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

    1991-01-01

    Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.

  2. Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.

    PubMed

    Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul

    2009-02-15

    The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.

  3. Design of a tight frame of 2D shearlets based on a fast non-iterative analysis and synthesis algorithm

    NASA Astrophysics Data System (ADS)

    Goossens, Bart; Aelterman, Jan; Luong, Hi"p.; Pižurica, Aleksandra; Philips, Wilfried

    2011-09-01

    The shearlet transform is a recent sibling in the family of geometric image representations that provides a traditional multiresolution analysis combined with a multidirectional analysis. In this paper, we present a fast DFT-based analysis and synthesis scheme for the 2D discrete shearlet transform. Our scheme conforms to the continuous shearlet theory to high extent, provides perfect numerical reconstruction (up to floating point rounding errors) in a non-iterative scheme and is highly suitable for parallel implementation (e.g. FPGA, GPU). We show that our discrete shearlet representation is also a tight frame and the redundancy factor of the transform is around 2.6, independent of the number of analysis directions. Experimental denoising results indicate that the transform performs the same or even better than several related multiresolution transforms, while having a significantly lower redundancy factor.

  4. An iterative method for systems of nonlinear hyperbolic equations

    NASA Technical Reports Server (NTRS)

    Scroggs, Jeffrey S.

    1989-01-01

    An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic equations is presented. Parallelism is evident at several levels. In the formation of the iteration, the equations are decoupled, thereby providing large grain parallelism. Parallelism may also be exploited within the solves for each equation. Convergence of the interation is established via a bounding function argument. Experimental results in two-dimensions are presented.

  5. Single-agent parallel window search

    NASA Technical Reports Server (NTRS)

    Powley, Curt; Korf, Richard E.

    1991-01-01

    Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.

  6. Solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators.

    PubMed

    Zhao, Jing; Zong, Haili

    2018-01-01

    In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.

  7. Parallelized implicit propagators for the finite-difference Schrödinger equation

    NASA Astrophysics Data System (ADS)

    Parker, Jonathan; Taylor, K. T.

    1995-08-01

    We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.

  8. Iterative algorithms for large sparse linear systems on parallel computers

    NASA Technical Reports Server (NTRS)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  9. Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi

    PubMed Central

    Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil

    2012-01-01

    The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480

  10. Twostep-by-twostep PIRK-type PC methods with continuous output formulas

    NASA Astrophysics Data System (ADS)

    Cong, Nguyen Huu; Xuan, Le Ngoc

    2008-11-01

    This paper deals with parallel predictor-corrector (PC) iteration methods based on collocation Runge-Kutta (RK) corrector methods with continuous output formulas for solving nonstiff initial-value problems (IVPs) for systems of first-order differential equations. At nth step, the continuous output formulas are used not only for predicting the stage values in the PC iteration methods but also for calculating the step values at (n+2)th step. In this case, the integration processes can be proceeded twostep-by-twostep. The resulting twostep-by-twostep (TBT) parallel-iterated RK-type (PIRK-type) methods with continuous output formulas (twostep-by-twostep PIRKC methods or TBTPIRKC methods) give us a faster integration process. Fixed stepsize applications of these TBTPIRKC methods to a few widely-used test problems reveal that the new PC methods are much more efficient when compared with the well-known parallel-iterated RK methods (PIRK methods), parallel-iterated RK-type PC methods with continuous output formulas (PIRKC methods) and sequential explicit RK codes DOPRI5 and DOP853 available from the literature.

  11. Design and implementation of highly parallel pipelined VLSI systems

    NASA Astrophysics Data System (ADS)

    Delange, Alphonsus Anthonius Jozef

    A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.

  12. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  13. AZTEC: A parallel iterative package for the solving linear systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

    1996-12-31

    We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.

  14. Matching pursuit parallel decomposition of seismic data

    NASA Astrophysics Data System (ADS)

    Li, Chuanhui; Zhang, Fanchang

    2017-07-01

    In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.

  15. Organizing Compression of Hyperspectral Imagery to Allow Efficient Parallel Decompression

    NASA Technical Reports Server (NTRS)

    Klimesh, Matthew A.; Kiely, Aaron B.

    2014-01-01

    family of schemes has been devised for organizing the output of an algorithm for predictive data compression of hyperspectral imagery so as to allow efficient parallelization in both the compressor and decompressor. In these schemes, the compressor performs a number of iterations, during each of which a portion of the data is compressed via parallel threads operating on independent portions of the data. The general idea is that for each iteration it is predetermined how much compressed data will be produced from each thread.

  16. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

    PubMed

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-06-01

    We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.

  17. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

    PubMed Central

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-01-01

    We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  18. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    NASA Astrophysics Data System (ADS)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)

  19. Computational aspects of helicopter trim analysis and damping levels from Floquet theory

    NASA Technical Reports Server (NTRS)

    Gaonkar, Gopal H.; Achar, N. S.

    1992-01-01

    Helicopter trim settings of periodic initial state and control inputs are investigated for convergence of Newton iteration in computing the settings sequentially and in parallel. The trim analysis uses a shooting method and a weak version of two temporal finite element methods with displacement formulation and with mixed formulation of displacements and momenta. These three methods broadly represent two main approaches of trim analysis: adaptation of initial-value and finite element boundary-value codes to periodic boundary conditions, particularly for unstable and marginally stable systems. In each method, both the sequential and in-parallel schemes are used and the resulting nonlinear algebraic equations are solved by damped Newton iteration with an optimally selected damping parameter. The impact of damped Newton iteration, including earlier-observed divergence problems in trim analysis, is demonstrated by the maximum condition number of the Jacobian matrices of the iterative scheme and by virtual elimination of divergence. The advantages of the in-parallel scheme over the conventional sequential scheme are also demonstrated.

  20. Parallel-vector computation for structural analysis and nonlinear unconstrained optimization problems

    NASA Technical Reports Server (NTRS)

    Nguyen, Duc T.

    1990-01-01

    Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.

  1. Run-time parallelization and scheduling of loops

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

    1991-01-01

    Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.

  2. Exploiting parallel computing with limited program changes using a network of microcomputers

    NASA Technical Reports Server (NTRS)

    Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

    1985-01-01

    Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.

  3. An iterative reduced field-of-view reconstruction for periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI.

    PubMed

    Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J

    2015-10-01

    To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p < 0.001). In vivo study validated the increased signal-to-noise ratio, which is over four times higher than with density compensation. Image sharpness index was improved using the regularized reconstruction implemented. The rFOV strategy permits near real-time iterative reconstruction to improve the image quality of PROPELLER images. Substantial improvements in image quality metrics were validated in the experiments. The concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.

  4. Vectorized and multitasked solution of the few-group neutron diffusion equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zee, S.K.; Turinsky, P.J.; Shayer, Z.

    1989-03-01

    A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less

  5. An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    NASA Astrophysics Data System (ADS)

    Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

    2015-07-01

    This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.

  6. Using parallel banded linear system solvers in generalized eigenvalue problems

    NASA Technical Reports Server (NTRS)

    Zhang, Hong; Moss, William F.

    1993-01-01

    Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.

  7. Parallel solution of the symmetric tridiagonal eigenproblem. Research report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jessup, E.R.

    1989-10-01

    This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less

  8. Parallel solution of the symmetric tridiagonal eigenproblem

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jessup, E.R.

    1989-01-01

    This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less

  9. Parallel Preconditioning for CFD Problems on the CM-5

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.

  10. Parallel iterative solution for h and p approximations of the shallow water equations

    USGS Publications Warehouse

    Barragy, E.J.; Walters, R.A.

    1998-01-01

    A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.

  11. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

    PubMed Central

    Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

    2014-01-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299

  12. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

    PubMed

    Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

    2014-11-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

  13. Run-time parallelization and scheduling of loops

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

    1990-01-01

    Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.

  14. Domain decomposition methods for the parallel computation of reacting flows

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1988-01-01

    Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.

  15. Toward Generalization of Iterative Small Molecule Synthesis

    PubMed Central

    Lehmann, Jonathan W.; Blair, Daniel J.; Burke, Martin D.

    2018-01-01

    Small molecules have extensive untapped potential to benefit society, but access to this potential is too often restricted by limitations inherent to the customized approach currently used to synthesize this class of chemical matter. In contrast, the “building block approach”, i.e., generalized iterative assembly of interchangeable parts, has now proven to be a highly efficient and flexible way to construct things ranging all the way from skyscrapers to macromolecules to artificial intelligence algorithms. The structural redundancy found in many small molecules suggests that they possess a similar capacity for generalized building block-based construction. It is also encouraging that many customized iterative synthesis methods have been developed that improve access to specific classes of small molecules. There has also been substantial recent progress toward the iterative assembly of many different types of small molecules, including complex natural products, pharmaceuticals, biological probes, and materials, using common building blocks and coupling chemistry. Collectively, these advances suggest that a generalized building block approach for small molecule synthesis may be within reach. PMID:29696152

  16. From synthesis to function via iterative assembly of N-methyliminodiacetic acid boronate building blocks.

    PubMed

    Li, Junqi; Grillo, Anthony S; Burke, Martin D

    2015-08-18

    The study and optimization of small molecule function is often impeded by the time-intensive and specialist-dependent process that is typically used to make such compounds. In contrast, general and automated platforms have been developed for making peptides, oligonucleotides, and increasingly oligosaccharides, where synthesis is simplified to iterative applications of the same reactions. Inspired by the way natural products are biosynthesized via the iterative assembly of a defined set of building blocks, we developed a platform for small molecule synthesis involving the iterative coupling of haloboronic acids protected as the corresponding N-methyliminodiacetic acid (MIDA) boronates. Here we summarize our efforts thus far to develop this platform into a generalized and automated approach for small molecule synthesis. We and others have employed this approach to access many polyene-based compounds, including the polyene motifs found in >75% of all polyene natural products. This platform further allowed us to derivatize amphotericin B, the powerful and resistance-evasive but also highly toxic last line of defense in treating systemic fungal infections, and thereby understand its mechanism of action. This synthesis-enabled mechanistic understanding has led us to develop less toxic derivatives currently under evaluation as improved antifungal agents. To access more Csp(3)-containing small molecules, we gained a stereocontrolled entry into chiral, non-racemic α-boryl aldehydes through the discovery of a chiral derivative of MIDA. These α-boryl aldehydes are versatile intermediates for the synthesis of many Csp(3) boronate building blocks that are otherwise difficult to access. In addition, we demonstrated the utility of these types of building blocks in accessing pharmaceutically relevant targets via an iterative Csp(3) cross-coupling cycle. We have further expanded the scope of the platform to include stereochemically complex macrocyclic and polycyclic molecules using a linear-to-cyclized strategy, in which Csp(3) boronate building blocks are iteratively assembled into linear precursors that are then cyclized into the cyclic frameworks found in many natural products and natural product-like structures. Enabled by the serendipitous discovery of a catch-and-release protocol for generally purifying MIDA boronate intermediates, the platform has been automated. The synthesis of 14 distinct classes of small molecules, including pharmaceuticals, materials components, and polycyclic natural products, has been achieved using this new synthesis machine. It is anticipated that the scope of small molecules accessible by this platform will continue to expand via further developments in building block synthesis, Csp(3) cross-coupling methodologies, and cyclization strategies. Achieving these goals will enable the more generalized synthesis of small molecules and thereby help shift the rate-limiting step in small molecule science from synthesis to function.

  17. Type synthesis for 4-DOF parallel press mechanism using GF set theory

    NASA Astrophysics Data System (ADS)

    He, Jun; Gao, Feng; Meng, Xiangdun; Guo, Weizhong

    2015-07-01

    Parallel mechanisms is used in the large capacity servo press to avoid the over-constraint of the traditional redundant actuation. Currently, the researches mainly focus on the performance analysis for some specific parallel press mechanisms. However, the type synthesis and evaluation of parallel press mechanisms is seldom studied, especially for the four degrees of freedom(DOF) press mechanisms. The type synthesis of 4-DOF parallel press mechanisms is carried out based on the generalized function(GF) set theory. Five design criteria of 4-DOF parallel press mechanisms are firstly proposed. The general procedure of type synthesis of parallel press mechanisms is obtained, which includes number synthesis, symmetrical synthesis of constraint GF sets, decomposition of motion GF sets and design of limbs. Nine combinations of constraint GF sets of 4-DOF parallel press mechanisms, ten combinations of GF sets of active limbs, and eleven combinations of GF sets of passive limbs are synthesized. Thirty-eight kinds of press mechanisms are presented and then different structures of kinematic limbs are designed. Finally, the geometrical constraint complexity( GCC), kinematic pair complexity( KPC), and type complexity( TC) are proposed to evaluate the press types and the optimal press type is achieved. The general methodologies of type synthesis and evaluation for parallel press mechanism are suggested.

  18. Image segmentation by iterative parallel region growing with application to data compression and image analysis

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    1988-01-01

    Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.

  19. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    NASA Astrophysics Data System (ADS)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU-based algorithms based on existing parallelization strategies.

  20. Parallel computation of multigroup reactivity coefficient using iterative method

    NASA Astrophysics Data System (ADS)

    Susmikanti, Mike; Dewayatna, Winter

    2013-09-01

    One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.

  1. A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

    NASA Technical Reports Server (NTRS)

    Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

    2005-01-01

    A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.

  2. Controlled iterative cross-coupling: on the way to the automation of organic synthesis.

    PubMed

    Wang, Congyang; Glorius, Frank

    2009-01-01

    Repetition does not hurt! New strategies for the modulation of the reactivity of difunctional building blocks are discussed, allowing the palladium-catalyzed controlled iterative cross-coupling and, thus, the efficient formation of complex molecules of defined size and structure (see scheme). As in peptide synthesis, this development will enable the automation of these reactions. M(PG)=protected metal, M(act)=metal.

  3. Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

    NASA Astrophysics Data System (ADS)

    Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

    2017-07-01

    Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

  4. Efficient Iterative Methods Applied to the Solution of Transonic Flows

    NASA Astrophysics Data System (ADS)

    Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

    1996-02-01

    We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.

  5. A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

    NASA Astrophysics Data System (ADS)

    Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

    2016-10-01

    Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.

  6. Multidisciplinary systems optimization by linear decomposition

    NASA Technical Reports Server (NTRS)

    Sobieski, J.

    1984-01-01

    In a typical design process major decisions are made sequentially. An illustrated example is given for an aircraft design in which the aerodynamic shape is usually decided first, then the airframe is sized for strength and so forth. An analogous sequence could be laid out for any other major industrial product, for instance, a ship. The loops in the discipline boxes symbolize iterative design improvements carried out within the confines of a single engineering discipline, or subsystem. The loops spanning several boxes depict multidisciplinary design improvement iterations. Omitted for graphical simplicity is parallelism of the disciplinary subtasks. The parallelism is important in order to develop a broad workfront necessary to shorten the design time. If all the intradisciplinary and interdisciplinary iterations were carried out to convergence, the process could yield a numerically optimal design. However, it usually stops short of that because of time and money limitations. This is especially true for the interdisciplinary iterations.

  7. Analysis and Design of ITER 1 MV Core Snubber

    NASA Astrophysics Data System (ADS)

    Wang, Haitian; Li, Ge

    2012-11-01

    The core snubber, as a passive protection device, can suppress arc current and absorb stored energy in stray capacitance during the electrical breakdown in accelerating electrodes of ITER NBI. In order to design the core snubber of ITER, the control parameters of the arc peak current have been firstly analyzed by the Fink-Baker-Owren (FBO) method, which are used for designing the DIIID 100 kV snubber. The B-H curve can be derived from the measured voltage and current waveforms, and the hysteresis loss of the core snubber can be derived using the revised parallelogram method. The core snubber can be a simplified representation as an equivalent parallel resistance and inductance, which has been neglected by the FBO method. A simulation code including the parallel equivalent resistance and inductance has been set up. The simulation and experiments result in dramatically large arc shorting currents due to the parallel inductance effect. The case shows that the core snubber utilizing the FBO method gives more compact design.

  8. Towards a Better Distributed Framework for Learning Big Data

    DTIC Science & Technology

    2017-06-14

    UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15

  9. Parallel/Vector Integration Methods for Dynamical Astronomy

    NASA Astrophysics Data System (ADS)

    Fukushima, Toshio

    1999-01-01

    This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).

  10. A Laboratory Preparation of Aspartame Analogs Using Simultaneous Multiple Parallel Synthesis Methodology

    ERIC Educational Resources Information Center

    Qvit, Nir; Barda, Yaniv; Gilon, Chaim; Shalev, Deborah E.

    2007-01-01

    This laboratory experiment provides a unique opportunity for students to synthesize three analogues of aspartame, a commonly used artificial sweetener. The students are introduced to the powerful and useful method of parallel synthesis while synthesizing three dipeptides in parallel using solid-phase peptide synthesis (SPPS) and simultaneous…

  11. Efficient iterative methods applied to the solution of transonic flows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.

    1996-02-01

    We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less

  12. Block iterative restoration of astronomical images with the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Heap, Sara R.; Lindler, Don J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

  13. Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram

    2013-04-09

    A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithmmore » is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.« less

  14. A Two Colorable Fourth Order Compact Difference Scheme and Parallel Iterative Solution of the 3D Convection Diffusion Equation

    NASA Technical Reports Server (NTRS)

    Zhang, Jun; Ge, Lixin; Kouatchou, Jules

    2000-01-01

    A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.

  15. Parallel Reconstruction Using Null Operations (PRUNO)

    PubMed Central

    Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

    2011-01-01

    A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290

  16. Xyce

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.

  17. On nonlinear finite element analysis in single-, multi- and parallel-processors

    NASA Technical Reports Server (NTRS)

    Utku, S.; Melosh, R.; Islam, M.; Salama, M.

    1982-01-01

    Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.

  18. Parallelization of implicit finite difference schemes in computational fluid dynamics

    NASA Technical Reports Server (NTRS)

    Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

    1990-01-01

    Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.

  19. A survey of packages for large linear systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Milne, Brent

    2000-02-11

    This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less

  20. Parallel Implementation of 3-D Iterative Reconstruction With Intra-Thread Update for the jPET-D4

    NASA Astrophysics Data System (ADS)

    Lam, Chih Fung; Yamaya, Taiga; Obi, Takashi; Yoshida, Eiji; Inadama, Naoko; Shibuya, Kengo; Nishikido, Fumihiko; Murayama, Hideo

    2009-02-01

    One way to speed-up iterative image reconstruction is by parallel computing with a computer cluster. However, as the number of computing threads increases, parallel efficiency decreases due to network transfer delay. In this paper, we proposed a method to reduce data transfer between computing threads by introducing an intra-thread update. The update factor is collected from each slave thread and a global image is updated as usual in the first K sub-iteration. In the rest of the sub-iterations, the global image is only updated at an interval which is controlled by a parameter L. In between that interval, the intra-thread update is carried out whereby an image update is performed in each slave thread locally. We investigated combinations of K and L parameters based on parallel implementation of RAMLA for the jPET-D4 scanner. Our evaluation used four workstations with a total of 16 slave threads. Each slave thread calculated a different set of LORs which are divided according to ring difference numbers. We assessed image quality of the proposed method with a hotspot simulation phantom. The figure of merit was the full-width-half-maximum of hotspots and the background normalized standard deviation. At an optimum K and L setting, we did not find significant change in the output images. We also applied the proposed method to a Hoffman phantom experiment and found the difference due to intra-thread update was negligible. With the intra-thread update, computation time could be reduced by about 23%.

  1. New Bandwidth Efficient Parallel Concatenated Coding Schemes

    NASA Technical Reports Server (NTRS)

    Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

    1996-01-01

    We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.

  2. Parallel ICA and its hardware implementation in hyperspectral image analysis

    NASA Astrophysics Data System (ADS)

    Du, Hongtao; Qi, Hairong; Peterson, Gregory D.

    2004-04-01

    Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.

  3. Scalable synthesis of sequence-defined, unimolecular macromolecules by Flow-IEG

    PubMed Central

    Leibfarth, Frank A.; Johnson, Jeremiah A.; Jamison, Timothy F.

    2015-01-01

    We report a semiautomated synthesis of sequence and architecturally defined, unimolecular macromolecules through a marriage of multistep flow synthesis and iterative exponential growth (Flow-IEG). The Flow-IEG system performs three reactions and an in-line purification in a total residence time of under 10 min, effectively doubling the molecular weight of an oligomeric species in an uninterrupted reaction sequence. Further iterations using the Flow-IEG system enable an exponential increase in molecular weight. Incorporating a variety of monomer structures and branching units provides control over polymer sequence and architecture. The synthesis of a uniform macromolecule with a molecular weight of 4,023 g/mol is demonstrated. The user-friendly nature, scalability, and modularity of Flow-IEG provide a general strategy for the automated synthesis of sequence-defined, unimolecular macromolecules. Flow-IEG is thus an enabling tool for theory validation, structure–property studies, and advanced applications in biotechnology and materials science. PMID:26269573

  4. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

    DTIC Science & Technology

    2011-12-16

    other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...Lab affiliates National Instruments, NEC, Nokia , NVIDIA, and Samsung. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism...concurrently update x, some of these CAS’s will fail and those parallel loop iterations will recompute their updates to x and try again. Consider the parallel

  5. Learning control system design based on 2-D theory - An application to parallel link manipulator

    NASA Technical Reports Server (NTRS)

    Geng, Z.; Carroll, R. L.; Lee, J. D.; Haynes, L. H.

    1990-01-01

    An approach to iterative learning control system design based on two-dimensional system theory is presented. A two-dimensional model for the iterative learning control system which reveals the connections between learning control systems and two-dimensional system theory is established. A learning control algorithm is proposed, and the convergence of learning using this algorithm is guaranteed by two-dimensional stability. The learning algorithm is applied successfully to the trajectory tracking control problem for a parallel link robot manipulator. The excellent performance of this learning algorithm is demonstrated by the computer simulation results.

  6. Adaptive implicit-explicit and parallel element-by-element iteration schemes

    NASA Technical Reports Server (NTRS)

    Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.

    1989-01-01

    Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.

  7. Soft-output decoding algorithms in iterative decoding of turbo codes

    NASA Technical Reports Server (NTRS)

    Benedetto, S.; Montorsi, G.; Divsalar, D.; Pollara, F.

    1996-01-01

    In this article, we present two versions of a simplified maximum a posteriori decoding algorithm. The algorithms work in a sliding window form, like the Viterbi algorithm, and can thus be used to decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring code trellis termination. A heuristic explanation is also given of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes (turbo codes). The performances of the two algorithms are compared on the basis of a powerful rate 1/3 parallel concatenated code. Basic circuits to implement the simplified a posteriori decoding algorithm using lookup tables, and two further approximations (linear and threshold), with a very small penalty, to eliminate the need for lookup tables are proposed.

  8. Comparison of multihardware parallel implementations for a phase unwrapping algorithm

    NASA Astrophysics Data System (ADS)

    Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

    2018-04-01

    Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.

  9. Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

    NASA Astrophysics Data System (ADS)

    Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

    2018-01-01

    We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.

  10. Solution of a tridiagonal system of equations on the finite element machine

    NASA Technical Reports Server (NTRS)

    Bostic, S. W.

    1984-01-01

    Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.

  11. Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

    NASA Technical Reports Server (NTRS)

    Fijany, Amir

    1993-01-01

    In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.

  12. Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

    NASA Astrophysics Data System (ADS)

    Marx, Alain; Lütjens, Hinrich

    2017-03-01

    A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.

  13. Modified Chebyshev Picard Iteration for Efficient Numerical Integration of Ordinary Differential Equations

    NASA Astrophysics Data System (ADS)

    Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.

    2013-09-01

    Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.

  14. Parallel fast multipole boundary element method applied to computational homogenization

    NASA Astrophysics Data System (ADS)

    Ptaszny, Jacek

    2018-01-01

    In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.

  15. Parallel iterative methods for sparse linear and nonlinear equations

    NASA Technical Reports Server (NTRS)

    Saad, Youcef

    1989-01-01

    As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.

  16. The Battlefield Environment Division Modeling Framework (BMF). Part 1: Optimizing the Atmospheric Boundary Layer Environment Model for Cluster Computing

    DTIC Science & Technology

    2014-02-01

    idle waiting for the wavefront to reach it. To overcome this, Reeve et al. (2001) 3 developed a scheme in analogy to the red-black Gauss - Seidel iterative ...understandable procedure calls. Parallelization of the SIMPLE iterative scheme with SIP used a red-black scheme similar to the red-black Gauss - Seidel ...scheme, the SIMPLE method, for pressure-velocity coupling. The result is a slowing convergence of the outer iterations . The red-black scheme excites a 2

  17. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    DOE PAGES

    Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

    2017-09-21

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less

  18. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Desai, Ajit; Khalil, Mohammad; Pettit, Chris

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less

  19. Development of parallel algorithms for electrical power management in space applications

    NASA Technical Reports Server (NTRS)

    Berry, Frederick C.

    1989-01-01

    The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.

  20. Parallel microfluidic synthesis of size-tunable polymeric nanoparticles using 3D flow focusing towards in vivo study

    PubMed Central

    Lim, Jong-Min; Bertrand, Nicolas; Valencia, Pedro M.; Rhee, Minsoung; Langer, Robert; Jon, Sangyong; Farokhzad, Omid C.; Karnik, Rohit

    2014-01-01

    Microfluidic synthesis of nanoparticles (NPs) can enhance the controllability and reproducibility in physicochemical properties of NPs compared to bulk synthesis methods. However, applications of microfluidic synthesis are typically limited to in vitro studies due to low production rates. Herein, we report the parallelization of NP synthesis by 3D hydrodynamic flow focusing (HFF) using a multilayer microfluidic system to enhance the production rate without losing the advantages of reproducibility, controllability, and robustness. Using parallel 3D HFF, polymeric poly(lactide-co-glycolide)-b-polyethyleneglycol (PLGA-PEG) NPs with sizes tunable in the range of 13–150 nm could be synthesized reproducibly with high production rate. As a proof of concept, we used this system to perform in vivo pharmacokinetic and biodistribution study of small (20 nm diameter) PLGA-PEG NPs that are otherwise difficult to synthesize. Microfluidic parallelization thus enables synthesis of NPs with tunable properties with production rates suitable for both in vitro and in vivo studies. PMID:23969105

  1. Research in computer science

    NASA Technical Reports Server (NTRS)

    Ortega, J. M.

    1986-01-01

    Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.

  2. Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

    NASA Technical Reports Server (NTRS)

    Padovan, Joseph; Krishna, Lala; Gute, Douglas

    1997-01-01

    Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.

  3. Adapting high-level language programs for parallel processing using data flow

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  4. Computational Issues in Damping Identification for Large Scale Problems

    NASA Technical Reports Server (NTRS)

    Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.

    1997-01-01

    Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.

  5. Process optimization using combinatorial design principles: parallel synthesis and design of experiment methods.

    PubMed

    Gooding, Owen W

    2004-06-01

    The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.

  6. Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

    NASA Technical Reports Server (NTRS)

    Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

    1989-01-01

    Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.

  7. Parallel heuristics for scalable community detection

    DOE PAGES

    Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

    2015-08-14

    Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less

  8. Application of lean manufacturing concepts to drug discovery: rapid analogue library synthesis.

    PubMed

    Weller, Harold N; Nirschl, David S; Petrillo, Edward W; Poss, Michael A; Andres, Charles J; Cavallaro, Cullen L; Echols, Martin M; Grant-Young, Katherine A; Houston, John G; Miller, Arthur V; Swann, R Thomas

    2006-01-01

    The application of parallel synthesis to lead optimization programs in drug discovery has been an ongoing challenge since the first reports of library synthesis. A number of approaches to the application of parallel array synthesis to lead optimization have been attempted over the years, ranging from widespread deployment by (and support of) individual medicinal chemists to centralization as a service by an expert core team. This manuscript describes our experience with the latter approach, which was undertaken as part of a larger initiative to optimize drug discovery. In particular, we highlight how concepts taken from the manufacturing sector can be applied to drug discovery and parallel synthesis to improve the timeliness and thus the impact of arrays on drug discovery.

  9. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

    PubMed

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2014-02-28

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.

  10. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

    PubMed Central

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2015-01-01

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230

  11. Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

    DOE PAGES

    Shao, Meiyue; Aktulga, H.  Metin; Yang, Chao; ...

    2017-09-14

    In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less

  12. Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions

    NASA Astrophysics Data System (ADS)

    Furuichi, Mikito; Nishiura, Daisuke

    2017-10-01

    We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.

  13. Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shao, Meiyue; Aktulga, H.  Metin; Yang, Chao

    In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less

  14. Performance Analysis of Iterative Channel Estimation and Multiuser Detection in Multipath DS-CDMA Channels

    NASA Astrophysics Data System (ADS)

    Li, Husheng; Betz, Sharon M.; Poor, H. Vincent

    2007-05-01

    This paper examines the performance of decision feedback based iterative channel estimation and multiuser detection in channel coded aperiodic DS-CDMA systems operating over multipath fading channels. First, explicit expressions describing the performance of channel estimation and parallel interference cancellation based multiuser detection are developed. These results are then combined to characterize the evolution of the performance of a system that iterates among channel estimation, multiuser detection and channel decoding. Sufficient conditions for convergence of this system to a unique fixed point are developed.

  15. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    NASA Astrophysics Data System (ADS)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  16. Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

    NASA Technical Reports Server (NTRS)

    Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

    2005-01-01

    A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.

  17. Performance and capacity analysis of Poisson photon-counting based Iter-PIC OCDMA systems.

    PubMed

    Li, Lingbin; Zhou, Xiaolin; Zhang, Rong; Zhang, Dingchen; Hanzo, Lajos

    2013-11-04

    In this paper, an iterative parallel interference cancellation (Iter-PIC) technique is developed for optical code-division multiple-access (OCDMA) systems relying on shot-noise limited Poisson photon-counting reception. The novel semi-analytical tool of extrinsic information transfer (EXIT) charts is used for analysing both the bit error rate (BER) performance as well as the channel capacity of these systems and the results are verified by Monte Carlo simulations. The proposed Iter-PIC OCDMA system is capable of achieving two orders of magnitude BER improvements and a 0.1 nats of capacity improvement over the conventional chip-level OCDMA systems at a coding rate of 1/10.

  18. Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

    DOE PAGES

    Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

    2017-03-05

    Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.

  19. Extending substructure based iterative solvers to multiple load and repeated analyses

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel

    1993-01-01

    Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.

  20. Domain decomposition preconditioners for the spectral collocation method

    NASA Technical Reports Server (NTRS)

    Quarteroni, Alfio; Sacchilandriani, Giovanni

    1988-01-01

    Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.

  1. Fluorous Parallel Synthesis of A Hydantoin/Thiohydantoin Library

    PubMed Central

    Lu, Yimin; Zhang, Wei

    2007-01-01

    Fluorous tagging strategy is applied to solution-phase parallel synthesis of a library containing hydantoin and thiohydantoin analogs. Two perfluoroalkyl (Rf)-tagged α-amino esters each react with 6 aromatic aldehydes under reductive amination conditions. Twelve amino esters then each react with 10 isocyanates and isothiocyanates in parallel. The resulting 120 ureas and thioureas undergo spontaneous cyclization to form the corresponding hydantoins and thiohydantoins. The intermediate and final product purifications are performed with solid-phase extraction (SPE) over FluoroFlash™ cartridges, no chromatography is required. Using standard instruments and straightforward SPE technique, one chemist accomplished the 120-member library synthesis in less than 5 working days, including starting material synthesis and product analysis. PMID:15789556

  2. Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

    PubMed

    Hielscher, Andreas H; Bartel, Sebastian

    2004-02-01

    Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.

  3. Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

    DOE PAGES

    Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

    2016-11-07

    Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less

  4. Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

    Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less

  5. Dissipation, Voltage Profile and Levy Dragon in a Special Ladder Network

    ERIC Educational Resources Information Center

    Ucak, C.

    2009-01-01

    A ladder network constructed by an elementary two-terminal network consisting of a parallel resistor-inductor block in series with a parallel resistor-capacitor block sometimes is said to have a non-dispersive dissipative response. This special ladder network is created iteratively by replacing the elementary two-terminal network in place of the…

  6. Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.

    PubMed

    Palkowski, Marek; Bielecki, Wlodzimierz

    2018-01-15

    RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.

  7. A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*

    PubMed Central

    Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.

    2012-01-01

    This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200

  8. Myria: Scalable Analytics as a Service

    NASA Astrophysics Data System (ADS)

    Howe, B.; Halperin, D.; Whitaker, A.

    2014-12-01

    At the UW eScience Institute, we're working to empower non-experts, especially in the sciences, to write and use data-parallel algorithms. To this end, we are building Myria, a web-based platform for scalable analytics and data-parallel programming. Myria's internal model of computation is the relational algebra extended with iteration, such that every program is inherently data-parallel, just as every query in a database is inherently data-parallel. But unlike databases, iteration is a first class concept, allowing us to express machine learning tasks, graph traversal tasks, and more. Programs can be expressed in a number of languages and can be executed on a number of execution environments, but we emphasize a particular language called MyriaL that supports both imperative and declarative styles and a particular execution engine called MyriaX that uses an in-memory column-oriented representation and asynchronous iteration. We deliver Myria over the web as a service, providing an editor, performance analysis tools, and catalog browsing features in a single environment. We find that this web-based "delivery vector" is critical in reaching non-experts: they are insulated from irrelevant effort technical work associated with installation, configuration, and resource management. The MyriaX backend, one of several execution runtimes we support, is a main-memory, column-oriented, RDBMS-on-the-worker system that supports cyclic data flows as a first-class citizen and has been shown to outperform competitive systems on 100-machine cluster sizes. I will describe the Myria system, give a demo, and present some new results in large-scale oceanographic microbiology.

  9. Global analysis of protein folding using massively parallel design, synthesis and testing

    PubMed Central

    Rocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, David

    2017-01-01

    Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Though these forces are “encoded” in the thousands of known protein structures, “decoding” them is challenging due to the complexity of natural proteins that have evolved for function, not stability. Here we combine computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for over 15,000 de novo designed miniproteins, 1,000 natural proteins, 10,000 point-mutants, and 30,000 negative control sequences, identifying over 2,500 new stable designed proteins in four basic folds. This scale—three orders of magnitude greater than that of previous studies of design or folding—enabled us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment, and promises to transform computational protein design into a data-driven science. PMID:28706065

  10. An Old Story in the Parallel Synthesis World: An Approach to Hydantoin Libraries.

    PubMed

    Bogolubsky, Andrey V; Moroz, Yurii S; Savych, Olena; Pipko, Sergey; Konovets, Angelika; Platonov, Maxim O; Vasylchenko, Oleksandr V; Hurmach, Vasyl V; Grygorenko, Oleksandr O

    2018-01-08

    An approach to the parallel synthesis of hydantoin libraries by reaction of in situ generated 2,2,2-trifluoroethylcarbamates and α-amino esters was developed. To demonstrate utility of the method, a library of 1158 hydantoins designed according to the lead-likeness criteria (MW 200-350, cLogP 1-3) was prepared. The success rate of the method was analyzed as a function of physicochemical parameters of the products, and it was found that the method can be considered as a tool for lead-oriented synthesis. A hydantoin-bearing submicromolar primary hit acting as an Aurora kinase A inhibitor was discovered with a combination of rational design, parallel synthesis using the procedures developed, in silico and in vitro screenings.

  11. The multigrid preconditioned conjugate gradient method

    NASA Technical Reports Server (NTRS)

    Tatebe, Osamu

    1993-01-01

    A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.

  12. Development of iterative techniques for the solution of unsteady compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Sankar, Lakshmi N.; Hixon, Duane

    1991-01-01

    Efficient iterative solution methods are being developed for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. Thus, the extra work required by iterative schemes can also be designed to perform efficiently on current and future generation scalable, missively parallel machines. An obvious candidate for iteratively solving the system of coupled nonlinear algebraic equations arising in CFD applications is the Newton method. Newton's method was implemented in existing finite difference and finite volume methods. Depending on the complexity of the problem, the number of Newton iterations needed per step to solve the discretized system of equations can, however, vary dramatically from a few to several hundred. Another popular approach based on the classical conjugate gradient method, known as the GMRES (Generalized Minimum Residual) algorithm is investigated. The GMRES algorithm was used in the past by a number of researchers for solving steady viscous and inviscid flow problems with considerable success. Here, the suitability of this algorithm is investigated for solving the system of nonlinear equations that arise in unsteady Navier-Stokes solvers at each time step. Unlike the Newton method which attempts to drive the error in the solution at each and every node down to zero, the GMRES algorithm only seeks to minimize the L2 norm of the error. In the GMRES algorithm the changes in the flow properties from one time step to the next are assumed to be the sum of a set of orthogonal vectors. By choosing the number of vectors to a reasonably small value N (between 5 and 20) the work required for advancing the solution from one time step to the next may be kept to (N+1) times that of a noniterative scheme. Many of the operations required by the GMRES algorithm such as matrix-vector multiplies, matrix additions and subtractions can all be vectorized and parallelized efficiently.

  13. LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

    PubMed Central

    Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

    2014-01-01

    We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094

  14. Performance Implications of Synchronization Support for Parallel FORTRAN Programs

    DTIC Science & Technology

    1991-06-17

    applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3

  15. A highly parallel multigrid-like method for the solution of the Euler equations

    NASA Technical Reports Server (NTRS)

    Tuminaro, Ray S.

    1989-01-01

    We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.

  16. Synthesis of Chemiluminescent Esters: A Combinatorial Synthesis Experiment for Organic Chemistry Students

    ERIC Educational Resources Information Center

    Duarte, Robert; Nielson, Janne T.; Dragojlovic, Veljko

    2004-01-01

    A group of techniques aimed at synthesizing a large number of structurally diverse compounds is called combinatorial synthesis. Synthesis of chemiluminescence esters using parallel combinatorial synthesis and mix-and-split combinatorial synthesis is experimented.

  17. Performance Analysis and Design Synthesis (PADS) computer program. Volume 2: Program description, part 2

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The QL module of the Performance Analysis and Design Synthesis (PADS) computer program is described. Execution of this module is initiated when and if subroutine PADSI calls subroutine GROPE. Subroutine GROPE controls the high level logical flow of the QL module. The purpose of the module is to determine a trajectory that satisfies the necessary variational conditions for optimal performance. The module achieves this by solving a nonlinear multi-point boundary value problem. The numerical method employed is described. It is an iterative technique that converges quadratically when it does converge. The three basic steps of the module are: (1) initialization, (2) iteration, and (3) culmination. For Volume 1 see N73-13199.

  18. The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less

  19. Enabling Chemistry Technologies and Parallel Synthesis-Accelerators of Drug Discovery Programmes.

    PubMed

    Vasudevan, A; Bogdan, A R; Koolman, H F; Wang, Y; Djuric, S W

    There is a pressing need to improve overall productivity in the pharmaceutical industry. Judicious investments in chemistry technologies can have a significant impact on cycle times, cost of goods and probability of technical success. This perspective describes some of these technologies developed and implemented at AbbVie, and their applications to the synthesis of novel scaffolds and to parallel synthesis. © 2017 Elsevier B.V. All rights reserved.

  20. Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

    NASA Astrophysics Data System (ADS)

    Arteaga, Santiago Egido

    1998-12-01

    The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

    CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less

  2. Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

    NASA Astrophysics Data System (ADS)

    Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

    2017-04-01

    Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40 Hz for a model of 2001 by 600 grid points with 5 m grid spacing and 5000 time steps, in less than 2.5 minutes per source. In practice, using 15 nodes (30 GPUs) to model 101 sources, each iteration took approximately 9 minutes. For reference, the same inversion run with our CPU code uses two hours per iteration. This was done using only a very simple wavefield interpolation technique, saving every second timestep. Using a more sophisticated checkpointing or wavefield reconstruction method would allow us to increase this model size significantly. Our results show that ordinary gaming GPUs are a viable alternative to the expensive professional GPUs often used today, when performing large scale modeling and inversion in geophysics.

  3. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  4. Synthesis of a Fluorescent Acridone Using a Grignard Addition, Oxidation, and Nucleophilic Aromatic Substitution Reaction Sequence

    ERIC Educational Resources Information Center

    Goodrich, Samuel; Patel, Miloni; Woydziak, Zachary R.

    2015-01-01

    A three-pot synthesis oriented for an undergraduate organic chemistry laboratory was developed to construct a fluorescent acridone molecule. This laboratory experiment utilizes Grignard addition to an aldehyde, alcohol oxidation, and iterative nucleophilic aromatic substitution steps to produce the final product. Each of the intermediates and the…

  5. Solution-phase parallel synthesis of aryloxyimino amides via a novel multicomponent reaction among aromatic (Z)-chlorooximes, isocyanides, and electron-deficient phenols.

    PubMed

    Mercalli, Valentina; Giustiniano, Mariateresa; Del Grosso, Erika; Varese, Monica; Cassese, Hilde; Massarotti, Alberto; Novellino, Ettore; Tron, Gian Cesare

    2014-11-10

    A library of 41 aryloxyimino amides was prepared via solution phase parallel synthesis by extending the multicomponent reaction of (Z)-chlorooximes and isocyanides to the use of electron-deficient phenols. The resulting aryloxyiminoamide derivatives can be used as intermediates for the synthesis of benzo[d]isoxazole-3-carboxamides, dramatically reducing the number of synthetic steps required by other methods reported in literature.

  6. Aztec user`s guide. Version 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

    1995-10-01

    Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less

  7. SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

    1990-01-01

    Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.

  8. Iterative nonlinear joint transform correlation for the detection of objects in cluttered scenes

    NASA Astrophysics Data System (ADS)

    Haist, Tobias; Tiziani, Hans J.

    1999-03-01

    An iterative correlation technique with digital image processing in the feedback loop for the detection of small objects in cluttered scenes is proposed. A scanning aperture is combined with the method in order to improve the immunity against noise and clutter. Multiple reference objects or different views of one object are processed in parallel. We demonstrate the method by detecting a noisy and distorted face in a crowd with a nonlinear joint transform correlator.

  9. Research progress on synthesis and characteristic about dendrimers

    NASA Astrophysics Data System (ADS)

    Tang, Zitao

    2017-12-01

    Dendrimers are hyper-branched polymers which have perfectly defined structures. Different from the common polymers, dendrimers are synthesized by a step-by-step iterative style, which starts from a central core and forms branching parts outward. The dendrimers also have different physical and chemical characteristics from common polymers. In this paper, contributions to dendrimer synthesis from different researchers with different scientific background, synthesis of different dendrimers, and applications of them will be reviewed.

  10. LDRD final report on massively-parallel linear programming : the parPCx system.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less

  11. AZTEC. Parallel Iterative method Software for Solving Linear Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.; Shadid, J.; Tuminaro, R.

    1995-07-01

    AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less

  12. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  13. Communications oriented programming of parallel iterative solutions of sparse linear systems

    NASA Technical Reports Server (NTRS)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  14. Radiation pattern synthesis of planar antennas using the iterative sampling method

    NASA Technical Reports Server (NTRS)

    Stutzman, W. L.; Coffey, E. L.

    1975-01-01

    A synthesis method is presented for determining an excitation of an arbitrary (but fixed) planar source configuration. The desired radiation pattern is specified over all or part of the visible region. It may have multiple and/or shaped main beams with low sidelobes. The iterative sampling method is used to find an excitation of the source which yields a radiation pattern that approximates the desired pattern to within a specified tolerance. In this paper the method is used to calculate excitations for line sources, linear arrays (equally and unequally spaced), rectangular apertures, rectangular arrays (arbitrary spacing grid), and circular apertures. Examples using these sources to form patterns with shaped main beams, multiple main beams, shaped sidelobe levels, and combinations thereof are given.

  15. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  16. A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

    PubMed

    Liang, Faming; Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

  17. A Parallel Fast Sweeping Method for the Eikonal Equation

    NASA Astrophysics Data System (ADS)

    Baker, B.

    2017-12-01

    Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.

  18. A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

    PubMed Central

    Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469

  19. Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Keni; Wu, Yu-Shu

    2006-04-17

    TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less

  20. Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Li, Y.

    2016-12-01

    We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.

  1. Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carleton, James Brian; Parks, Michael L.

    Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less

  2. Iterative image reconstruction for PROPELLER-MRI using the nonuniform fast fourier transform.

    PubMed

    Tamhane, Ashish A; Anastasio, Mark A; Gui, Minzhi; Arfanakis, Konstantinos

    2010-07-01

    To investigate an iterative image reconstruction algorithm using the nonuniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) MRI. Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it with that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased signal to noise ratio, reduced artifacts, for similar spatial resolution, compared with gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter, the new reconstruction technique may provide PROPELLER images with improved image quality compared with conventional gridding. (c) 2010 Wiley-Liss, Inc.

  3. Iterative Image Reconstruction for PROPELLER-MRI using the NonUniform Fast Fourier Transform

    PubMed Central

    Tamhane, Ashish A.; Anastasio, Mark A.; Gui, Minzhi; Arfanakis, Konstantinos

    2013-01-01

    Purpose To investigate an iterative image reconstruction algorithm using the non-uniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping parallEL Lines with Enhanced Reconstruction) MRI. Materials and Methods Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it to that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. Results It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased SNR, reduced artifacts, for similar spatial resolution, compared to gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. Conclusion An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter the new reconstruction technique may provide PROPELLER images with improved image quality compared to conventional gridding. PMID:20578028

  4. Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

    DTIC Science & Technology

    2012-10-01

    using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS

  5. Real-Time, Multiple, Pan/Tilt/Zoom, Computer Vision Tracking, and 3D Position Estimating System for Unmanned Aerial System Metrology

    DTIC Science & Technology

    2013-10-18

    of the enclosed tasks plus the last parallel task for a total of five parallel tasks for one iteration, i). for j = 1…N for i = 1… 8 Figure...drizzling juices culminating in a state of salivating desire to cut a piece and enjoy. On the other hand, the smell could be that of a pungent, unpleasant

  6. Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

    NASA Astrophysics Data System (ADS)

    Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

    2016-07-01

    Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.

  7. Parallelizing quantum circuit synthesis

    NASA Astrophysics Data System (ADS)

    Di Matteo, Olivia; Mosca, Michele

    2016-03-01

    Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.

  8. Methodes iteratives paralleles: Applications en neutronique et en mecanique des fluides

    NASA Astrophysics Data System (ADS)

    Qaddouri, Abdessamad

    Dans cette these, le calcul parallele est applique successivement a la neutronique et a la mecanique des fluides. Dans chacune de ces deux applications, des methodes iteratives sont utilisees pour resoudre le systeme d'equations algebriques resultant de la discretisation des equations du probleme physique. Dans le probleme de neutronique, le calcul des matrices des probabilites de collision (PC) ainsi qu'un schema iteratif multigroupe utilisant une methode inverse de puissance sont parallelises. Dans le probleme de mecanique des fluides, un code d'elements finis utilisant un algorithme iteratif du type GMRES preconditionne est parallelise. Cette these est presentee sous forme de six articles suivis d'une conclusion. Les cinq premiers articles traitent des applications en neutronique, articles qui representent l'evolution de notre travail dans ce domaine. Cette evolution passe par un calcul parallele des matrices des PC et un algorithme multigroupe parallele teste sur un probleme unidimensionnel (article 1), puis par deux algorithmes paralleles l'un mutiregion l'autre multigroupe, testes sur des problemes bidimensionnels (articles 2--3). Ces deux premieres etapes sont suivies par l'application de deux techniques d'acceleration, le rebalancement neutronique et la minimisation du residu aux deux algorithmes paralleles (article 4). Finalement, on a mis en oeuvre l'algorithme multigroupe et le calcul parallele des matrices des PC sur un code de production DRAGON ou les tests sont plus realistes et peuvent etre tridimensionnels (article 5). Le sixieme article (article 6), consacre a l'application a la mecanique des fluides, traite la parallelisation d'un code d'elements finis FES ou le partitionneur de graphe METIS et la librairie PSPARSLIB sont utilises.

  9. Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization

    DTIC Science & Technology

    2014-05-01

    exact one is solved later — as- signed as step 5 of Algorithm 2 — because at each iteration , the ADMM updates the variables in the Gauss - Seidel ...Jacobi ADMM (see Algo- rithm 5 below). Unlike the Gauss - Seidel ADMM, the Jacobi ADMM updates all the 70 blocks in parallel at every iteration : xk+1i...that extending ADMM straightforwardly from the classic Gauss - Seidel setting to the Jacobi setting, from two blocks to multiple blocks, will preserve

  10. Inversion of potential field data using the finite element method on parallel computers

    NASA Astrophysics Data System (ADS)

    Gross, L.; Altinay, C.; Shaw, S.

    2015-11-01

    In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.

  11. Scalable splitting algorithms for big-data interferometric imaging in the SKA era

    NASA Astrophysics Data System (ADS)

    Onose, Alexandru; Carrillo, Rafael E.; Repetti, Audrey; McEwen, Jason D.; Thiran, Jean-Philippe; Pesquet, Jean-Christophe; Wiaux, Yves

    2016-11-01

    In the context of next-generation radio telescopes, like the Square Kilometre Array (SKA), the efficient processing of large-scale data sets is extremely important. Convex optimization tasks under the compressive sensing framework have recently emerged and provide both enhanced image reconstruction quality and scalability to increasingly larger data sets. We focus herein mainly on scalability and propose two new convex optimization algorithmic structures able to solve the convex optimization tasks arising in radio-interferometric imaging. They rely on proximal splitting and forward-backward iterations and can be seen, by analogy, with the CLEAN major-minor cycle, as running sophisticated CLEAN-like iterations in parallel in multiple data, prior, and image spaces. Both methods support any convex regularization function, in particular, the well-studied ℓ1 priors promoting image sparsity in an adequate domain. Tailored for big-data, they employ parallel and distributed computations to achieve scalability, in terms of memory and computational requirements. One of them also exploits randomization, over data blocks at each iteration, offering further flexibility. We present simulation results showing the feasibility of the proposed methods as well as their advantages compared to state-of-the-art algorithmic solvers. Our MATLAB code is available online on GitHub.

  12. Performance evaluation of canny edge detection on a tiled multicore architecture

    NASA Astrophysics Data System (ADS)

    Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

    2011-01-01

    In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.

  13. FPGA-Based Filterbank Implementation for Parallel Digital Signal Processing

    NASA Technical Reports Server (NTRS)

    Berner, Stephan; DeLeon, Phillip

    1999-01-01

    One approach to parallel digital signal processing decomposes a high bandwidth signal into multiple lower bandwidth (rate) signals by an analysis bank. After processing, the subband signals are recombined into a fullband output signal by a synthesis bank. This paper describes an implementation of the analysis and synthesis banks using (Field Programmable Gate Arrays) FPGAs.

  14. Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy.

    PubMed

    Zelyak, O; Fallone, B G; St-Aubin, J

    2017-12-14

    Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.

  15. Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy

    NASA Astrophysics Data System (ADS)

    Zelyak, O.; Fallone, B. G.; St-Aubin, J.

    2018-01-01

    Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low-density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation.

  16. Corrigendum to "Stability analysis of a deterministic dose calculation for MRI-guided radiotherapy".

    PubMed

    Zelyak, Oleksandr; Fallone, B Gino; St-Aubin, Joel

    2018-03-12

    Modern effort in radiotherapy to address the challenges of tumor localization and motion has led to the development of MRI guided radiotherapy technologies. Accurate dose calculations must properly account for the effects of the MRI magnetic fields. Previous work has investigated the accuracy of a deterministic linear Boltzmann transport equation (LBTE) solver that includes magnetic field, but not the stability of the iterative solution method. In this work, we perform a stability analysis of this deterministic algorithm including an investigation of the convergence rate dependencies on the magnetic field, material density, energy, and anisotropy expansion. The iterative convergence rate of the continuous and discretized LBTE including magnetic fields is determined by analyzing the spectral radius using Fourier analysis for the stationary source iteration (SI) scheme. The spectral radius is calculated when the magnetic field is included (1) as a part of the iteration source, and (2) inside the streaming-collision operator. The non-stationary Krylov subspace solver GMRES is also investigated as a potential method to accelerate the iterative convergence, and an angular parallel computing methodology is investigated as a method to enhance the efficiency of the calculation. SI is found to be unstable when the magnetic field is part of the iteration source, but unconditionally stable when the magnetic field is included in the streaming-collision operator. The discretized LBTE with magnetic fields using a space-angle upwind stabilized discontinuous finite element method (DFEM) was also found to be unconditionally stable, but the spectral radius rapidly reaches unity for very low density media and increasing magnetic field strengths indicating arbitrarily slow convergence rates. However, GMRES is shown to significantly accelerate the DFEM convergence rate showing only a weak dependence on the magnetic field. In addition, the use of an angular parallel computing strategy is shown to potentially increase the efficiency of the dose calculation. © 2018 Institute of Physics and Engineering in Medicine.

  17. Multiprocessor smalltalk: Implementation, performance, and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pallas, J.I.

    1990-01-01

    Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less

  18. Exploiting loop level parallelism in nonprocedural dataflow programs

    NASA Technical Reports Server (NTRS)

    Gokhale, Maya B.

    1987-01-01

    Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.

  19. Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

    DTIC Science & Technology

    1988-01-21

    p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based

  20. Synthesis of Nanoporous Activated Iridium Oxide Films by Anodized Aluminum Oxide Templated Atomic Layer Deposition

    DTIC Science & Technology

    2010-11-01

    number of deposition strategies, including sputtering [10–12] and electrodeposition [13,14]. With all synthesis strategies, control of the film...to 10% ozone in 400 sccm O2 for 10 min. A 20 Å Al2O3 film was then deposited as a nucleation layer by iterative exposures of trimethyla- luminum and

  1. PRIM: An Efficient Preconditioning Iterative Reweighted Least Squares Method for Parallel Brain MRI Reconstruction.

    PubMed

    Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou

    2018-02-08

    The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.

  2. SPIRiT: Iterative Self-consistent Parallel Imaging Reconstruction from Arbitrary k-Space

    PubMed Central

    Lustig, Michael; Pauly, John M.

    2010-01-01

    A new approach to autocalibrating, coil-by-coil parallel imaging reconstruction is presented. It is a generalized reconstruction framework based on self consistency. The reconstruction problem is formulated as an optimization that yields the most consistent solution with the calibration and acquisition data. The approach is general and can accurately reconstruct images from arbitrary k-space sampling patterns. The formulation can flexibly incorporate additional image priors such as off-resonance correction and regularization terms that appear in compressed sensing. Several iterative strategies to solve the posed reconstruction problem in both image and k-space domain are presented. These are based on a projection over convex sets (POCS) and a conjugate gradient (CG) algorithms. Phantom and in-vivo studies demonstrate efficient reconstructions from undersampled Cartesian and spiral trajectories. Reconstructions that include off-resonance correction and nonlinear ℓ1-wavelet regularization are also demonstrated. PMID:20665790

  3. Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

    NASA Astrophysics Data System (ADS)

    Yang, Xiang; Mittal, Rajat

    2017-03-01

    In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.

  4. Aerodynamic optimization by simultaneously updating flow variables and design parameters

    NASA Technical Reports Server (NTRS)

    Rizk, M. H.

    1990-01-01

    The application of conventional optimization schemes to aerodynamic design problems leads to inner-outer iterative procedures that are very costly. An alternative approach is presented based on the idea of updating the flow variable iterative solutions and the design parameter iterative solutions simultaneously. Two schemes based on this idea are applied to problems of correcting wind tunnel wall interference and optimizing advanced propeller designs. The first of these schemes is applicable to a limited class of two-design-parameter problems with an equality constraint. It requires the computation of a single flow solution. The second scheme is suitable for application to general aerodynamic problems. It requires the computation of several flow solutions in parallel. In both schemes, the design parameters are updated as the iterative flow solutions evolve. Computations are performed to test the schemes' efficiency, accuracy, and sensitivity to variations in the computational parameters.

  5. Fast iterative censoring CFAR algorithm for ship detection from SAR images

    NASA Astrophysics Data System (ADS)

    Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng

    2017-11-01

    Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.

  6. Automated three-component synthesis of a library of γ-lactams

    PubMed Central

    Fenster, Erik; Hill, David; Reiser, Oliver

    2012-01-01

    Summary A three-component method for the synthesis of γ-lactams from commercially available maleimides, aldehydes, and amines was adapted to parallel library synthesis. Improvements to the chemistry over previous efforts include the optimization of the method to a one-pot process, the management of by-products and excess reagents, the development of an automated parallel sequence, and the adaption of the method to permit the preparation of enantiomerically enriched products. These efforts culminated in the preparation of a library of 169 γ-lactams. PMID:23209515

  7. Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

    NASA Astrophysics Data System (ADS)

    Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

    2018-03-01

    This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.

  8. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  9. Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

    Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less

  10. Parallel AFSA algorithm accelerating based on MIC architecture

    NASA Astrophysics Data System (ADS)

    Zhou, Junhao; Xiao, Hong; Huang, Yifan; Li, Yongzhao; Xu, Yuanrui

    2017-05-01

    Analysis AFSA past for solving the traveling salesman problem, the algorithm efficiency is often a big problem, and the algorithm processing method, it does not fully responsive to the characteristics of the traveling salesman problem to deal with, and therefore proposes a parallel join improved AFSA process. The simulation with the current TSP known optimal solutions were analyzed, the results showed that the AFSA iterations improved less, on the MIC cards doubled operating efficiency, efficiency significantly.

  11. Air Force Maui Optical and Supercomputing Site (AMOS) Application Briefs 2004

    DTIC Science & Technology

    2004-01-01

    Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection...17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 60 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT...SOR methods, the work in parallel matrix iterative methods is very relevant to the parallelization of a CA. The Moore neighborhood used in this work

  12. Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

    NASA Astrophysics Data System (ADS)

    Chacon, L.; Finn, J. M.; Knoll, D. A.

    2000-10-01

    Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.

  13. A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoemmen, Mark

    2010-11-01

    Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less

  14. Novel and Efficient Synthesis of the Promising Drug Candidate Discodermolide

    DTIC Science & Technology

    2010-02-01

    stereotriad building blocks for discodermolide and related polyketide antibiotics could be obtained from variations on a short, scalable scheme that did...chains required for the chemical synthesis of the nonaromatic polyketides is usually based on the iterative lengthening of an acyclic substituted chain...burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Department of Defense

  15. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  16. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    NASA Astrophysics Data System (ADS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  17. FPGA implementation of low complexity LDPC iterative decoder

    NASA Astrophysics Data System (ADS)

    Verma, Shivani; Sharma, Sanjay

    2016-07-01

    Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.

  18. Drifts, currents, and power scrape-off width in SOLPS-ITER modeling of DIII-D

    DOE PAGES

    Meier, E. T.; Goldston, R. J.; Kaveeva, E. G.; ...

    2016-12-27

    The effects of drifts and associated flows and currents on the width of the parallel heat flux channel (λ q) in the tokamak scrape-off layer (SOL) are analyzed using the SOLPS-ITER 2D fluid transport code. Motivation is supplied by Goldston’s heuristic drift (HD) model for λ q, which yields the same approximately inverse poloidal magnetic field dependence seen in multi-machine regression. The analysis, focusing on a DIII-D H-mode discharge, reveals HD-like features, including comparable density and temperature fall-off lengths in the SOL, and up-down ion pressure asymmetry that allows net cross-separatrix ion magnetic drift flux to exceed net anomalous ionmore » flux. In experimentally relevant high-recycling cases, scans of both toroidal and poloidal magnetic field (B tor and B pol) are conducted, showing minimal λ q dependence on either component of the field. Insensitivity to B tor is expected, and suggests that SOLPS-ITER is effectively capturing some aspects of HD physics. Absence of λ q dependence on B pol, however, is inconsistent with both the HD model and experimental results. As a result, the inconsistency is attributed to strong variation in the parallel Mach number, which violates one of the premises of the HD model.« less

  19. Detection of mouse liver cancer via a parallel iterative shrinkage method in hybrid optical/microcomputed tomography imaging

    NASA Astrophysics Data System (ADS)

    Wu, Ping; Liu, Kai; Zhang, Qian; Xue, Zhenwen; Li, Yongbao; Ning, Nannan; Yang, Xin; Li, Xingde; Tian, Jie

    2012-12-01

    Liver cancer is one of the most common malignant tumors worldwide. In order to enable the noninvasive detection of small liver tumors in mice, we present a parallel iterative shrinkage (PIS) algorithm for dual-modality tomography. It takes advantage of microcomputed tomography and multiview bioluminescence imaging, providing anatomical structure and bioluminescence intensity information to reconstruct the size and location of tumors. By incorporating prior knowledge of signal sparsity, we associate some mathematical strategies including specific smooth convex approximation, an iterative shrinkage operator, and affine subspace with the PIS method, which guarantees the accuracy, efficiency, and reliability for three-dimensional reconstruction. Then an in vivo experiment on the bead-implanted mouse has been performed to validate the feasibility of this method. The findings indicate that a tiny lesion less than 3 mm in diameter can be localized with a position bias no more than 1 mm the computational efficiency is one to three orders of magnitude faster than the existing algorithms; this approach is robust to the different regularization parameters and the lp norms. Finally, we have applied this algorithm to another in vivo experiment on an HCCLM3 orthotopic xenograft mouse model, which suggests the PIS method holds the promise for practical applications of whole-body cancer detection.

  20. A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Lim, Chieng-Fai

    1991-01-01

    The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.

  1. Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moryakov, A. V., E-mail: sailor@orc.ru

    2016-12-15

    An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.

  2. A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

    In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.

  3. Short Enantioselective Total Synthesis of Tatanan A and 3‐epi‐Tatanan A Using Assembly‐Line Synthesis

    PubMed Central

    Noble, Adam; Roesner, Stefan

    2016-01-01

    Abstract Short and highly stereoselective total syntheses of the sesquilignan natural product tatanan A and its C3 epimer are described. An assembly‐line synthesis approach, using iterative lithiation–borylation reactions, was applied to install the three contiguous stereocenters with high enantio‐ and diastereoselectivity. One of the stereocenters was installed using a configurationally labile lithiated primary benzyl benzoate, resulting in high levels of substrate‐controlled (undesired) diastereoselectivity. However, reversal of selectivity was achieved by using a novel diastereoselective Matteson homologation. Stereospecific alkynylation of a hindered secondary benzylic boronic ester enabled completion of the synthesis in a total of eight steps. PMID:27865037

  4. Chebyshev polynomial filtered subspace iteration in the discontinuous Galerkin method for large-scale electronic structure calculations

    DOE PAGES

    Banerjee, Amartya S.; Lin, Lin; Hu, Wei; ...

    2016-10-21

    The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) canmore » be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALB functions requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale twodimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. In conclusion, employing 55 296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8586 atoms is 90 s, and the time for a graphene sheet containing 11 520 atoms is 75 s.« less

  5. Acceleration of linear stationary iterative processes in multiprocessor computers. II

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Romm, Ya.E.

    1982-05-01

    For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less

  6. Reconstruction of coded aperture images

    NASA Technical Reports Server (NTRS)

    Bielefeld, Michael J.; Yin, Lo I.

    1987-01-01

    Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.

  7. Research in Computational Aeroscience Applications Implemented on Advanced Parallel Computing Systems

    NASA Technical Reports Server (NTRS)

    Wigton, Larry

    1996-01-01

    Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.

  8. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    PubMed

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  9. The wave-based substructuring approach for the efficient description of interface dynamics in substructuring

    NASA Astrophysics Data System (ADS)

    Donders, S.; Pluymers, B.; Ragnarsson, P.; Hadjit, R.; Desmet, W.

    2010-04-01

    In the vehicle design process, design decisions are more and more based on virtual prototypes. Due to competitive and regulatory pressure, vehicle manufacturers are forced to improve product quality, to reduce time-to-market and to launch an increasing number of design variants on the global market. To speed up the design iteration process, substructuring and component mode synthesis (CMS) methods are commonly used, involving the analysis of substructure models and the synthesis of the substructure analysis results. Substructuring and CMS enable efficient decentralized collaboration across departments and allow to benefit from the availability of parallel computing environments. However, traditional CMS methods become prohibitively inefficient when substructures are coupled along large interfaces, i.e. with a large number of degrees of freedom (DOFs) at the interface between substructures. The reason is that the analysis of substructures involves the calculation of a number of enrichment vectors, one for each interface degree of freedom (DOF). Since large interfaces are common in vehicles (e.g. the continuous line connections to connect the body with the windshield, roof or floor), this interface bottleneck poses a clear limitation in the vehicle noise, vibration and harshness (NVH) design process. Therefore there is a need to describe the interface dynamics more efficiently. This paper presents a wave-based substructuring (WBS) approach, which allows reducing the interface representation between substructures in an assembly by expressing the interface DOFs in terms of a limited set of basis functions ("waves"). As the number of basis functions can be much lower than the number of interface DOFs, this greatly facilitates the substructure analysis procedure and results in faster design predictions. The waves are calculated once from a full nominal assembly analysis, but these nominal waves can be re-used for the assembly of modified components. The WBS approach thus enables efficient structural modification predictions of the global modes, so that efficient vibro-acoustic design modification, optimization and robust design become possible. The results show that wave-based substructuring offers a clear benefit for vehicle design modifications, by improving both the speed of component reduction processes and the efficiency and accuracy of design iteration predictions, as compared to conventional substructuring approaches.

  10. Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

    DOE PAGES

    Bunting, Gregory; Prakash, Arun; Walsh, Timothy; ...

    2018-01-26

    Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less

  11. Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bunting, Gregory; Prakash, Arun; Walsh, Timothy

    Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less

  12. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

  13. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774

  14. Dimensional synthesis of a 3-DOF parallel manipulator with full circle rotation

    NASA Astrophysics Data System (ADS)

    Ni, Yanbing; Wu, Nan; Zhong, Xueyong; Zhang, Biao

    2015-07-01

    Parallel robots are widely used in the academic and industrial fields. In spite of the numerous achievements in the design and dimensional synthesis of the low-mobility parallel robots, few research efforts are directed towards the asymmetric 3-DOF parallel robots whose end-effector can realize 2 translational and 1 rotational(2T1R) motion. In order to develop a manipulator with the capability of full circle rotation to enlarge the workspace, a new 2T1R parallel mechanism is proposed. The modeling approach and kinematic analysis of this proposed mechanism are investigated. Using the method of vector analysis, the inverse kinematic equations are established. This is followed by a vigorous proof that this mechanism attains an annular workspace through its circular rotation and 2 dimensional translations. Taking the first order perturbation of the kinematic equations, the error Jacobian matrix which represents the mapping relationship between the error sources of geometric parameters and the end-effector position errors is derived. With consideration of the constraint conditions of pressure angles and feasible workspace, the dimensional synthesis is conducted with a goal to minimize the global comprehensive performance index. The dimension parameters making the mechanism to have optimal error mapping and kinematic performance are obtained through the optimization algorithm. All these research achievements lay the foundation for the prototype building of such kind of parallel robots.

  15. Multilevel acceleration of scattering-source iterations with application to electron transport

    DOE PAGES

    Drumm, Clif; Fan, Wesley

    2017-08-18

    Acceleration/preconditioning strategies available in the SCEPTRE radiation transport code are described. A flexible transport synthetic acceleration (TSA) algorithm that uses a low-order discrete-ordinates (S N) or spherical-harmonics (P N) solve to accelerate convergence of a high-order S N source-iteration (SI) solve is described. Convergence of the low-order solves can be further accelerated by applying off-the-shelf incomplete-factorization or algebraic-multigrid methods. Also available is an algorithm that uses a generalized minimum residual (GMRES) iterative method rather than SI for convergence, using a parallel sweep-based solver to build up a Krylov subspace. TSA has been applied as a preconditioner to accelerate the convergencemore » of the GMRES iterations. The methods are applied to several problems involving electron transport and problems with artificial cross sections with large scattering ratios. These methods were compared and evaluated by considering material discontinuities and scattering anisotropy. Observed accelerations obtained are highly problem dependent, but speedup factors around 10 have been observed in typical applications.« less

  16. Iterative Importance Sampling Algorithms for Parameter Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.

    In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less

  17. Application of a novel design paradigm to generate general nonpeptide combinatorial templates mimicking beta-turns: synthesis of ligands for melanocortin receptors.

    PubMed

    Webb, Thomas R; Jiang, Luyong; Sviridov, Sergey; Venegas, Ruben E; Vlaskina, Anna V; McGrath, Douglas; Tucker, John; Wang, Jian; Deschenes, Alain; Li, Rongshi

    2007-01-01

    We report the further application of a novel approach to template and ligand design by the synthesis of agonists of the melanocortin receptor. This design method uses the conserved structural data from the three-dimensional conformations of beta-turn peptides to design rigid nonpeptide templates that mimic the orientation of the main chain C-alpha atoms in a peptide beta-turn. We report details on a new synthesis of derivatives of template 1 that are useful for the synthesis of exploratory libraries. The utility of this technique is further exemplified by several iterative rounds of high-throughput synthesis and screening, which result in new partially optimized nonpeptide agonists for several melanocortin receptors.

  18. Capabilities of Fully Parallelized MHD Stability Code MARS

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2016-10-01

    Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.

  19. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  20. Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.

    PubMed

    Tao, Liang; Kwan, Hon Keung

    2012-07-01

    Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.

  1. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less

  2. Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods

    PubMed Central

    Smith, David S.; Gore, John C.; Yankeelov, Thomas E.; Welch, E. Brian

    2012-01-01

    Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images. PMID:22481908

  3. Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods.

    PubMed

    Smith, David S; Gore, John C; Yankeelov, Thomas E; Welch, E Brian

    2012-01-01

    Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 4096(2) or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 1024(2) and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images.

  4. Catalyst-controlled oligomerization for the collective synthesis of polypyrroloindoline natural products.

    PubMed

    Jamison, Christopher R; Badillo, Joseph J; Lipshultz, Jeffrey M; Comito, Robert J; MacMillan, David W C

    2017-12-01

    In nature, many organisms generate large families of natural product metabolites that have related molecular structures as a means to increase functional diversity and gain an evolutionary advantage against competing systems within the same environment. One pathway commonly employed by living systems to generate these large classes of structurally related families is oligomerization, wherein a series of enzymatically catalysed reactions is employed to generate secondary metabolites by iteratively appending monomers to a growing serial oligomer chain. The polypyrroloindolines are an interesting class of oligomeric natural products that consist of multiple cyclotryptamine subunits. Herein we describe an iterative application of asymmetric copper catalysis towards the synthesis of six distinct oligomeric polypyrroloindoline natural products: hodgkinsine, hodgkinsine B, idiospermuline, quadrigemine H and isopsychotridine B and C. Given the customizable nature of the small-molecule catalysts employed, we demonstrate that this strategy is further amenable to the construction of quadrigemine H-type alkaloids not isolated previously from natural sources.

  5. Catalyst-controlled oligomerization for the collective synthesis of polypyrroloindoline natural products

    NASA Astrophysics Data System (ADS)

    Jamison, Christopher R.; Badillo, Joseph J.; Lipshultz, Jeffrey M.; Comito, Robert J.; MacMillan, David W. C.

    2017-12-01

    In nature, many organisms generate large families of natural product metabolites that have related molecular structures as a means to increase functional diversity and gain an evolutionary advantage against competing systems within the same environment. One pathway commonly employed by living systems to generate these large classes of structurally related families is oligomerization, wherein a series of enzymatically catalysed reactions is employed to generate secondary metabolites by iteratively appending monomers to a growing serial oligomer chain. The polypyrroloindolines are an interesting class of oligomeric natural products that consist of multiple cyclotryptamine subunits. Herein we describe an iterative application of asymmetric copper catalysis towards the synthesis of six distinct oligomeric polypyrroloindoline natural products: hodgkinsine, hodgkinsine B, idiospermuline, quadrigemine H and isopsychotridine B and C. Given the customizable nature of the small-molecule catalysts employed, we demonstrate that this strategy is further amenable to the construction of quadrigemine H-type alkaloids not isolated previously from natural sources.

  6. New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

    NASA Technical Reports Server (NTRS)

    Nguyen, Duc T.

    1998-01-01

    Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.

  7. Segmentation of remotely sensed data using parallel region growing

    NASA Technical Reports Server (NTRS)

    Tilton, J. C.; Cox, S. C.

    1983-01-01

    The improved spatial resolution of the new earth resources satellites will increase the need for effective utilization of spatial information in machine processing of remotely sensed data. One promising technique is scene segmentation by region growing. Region growing can use spatial information in two ways: only spatially adjacent regions merge together, and merging criteria can be based on region-wide spatial features. A simple region growing approach is described in which the similarity criterion is based on region mean and variance (a simple spatial feature). An effective way to implement region growing for remote sensing is as an iterative parallel process on a large parallel processor. A straightforward parallel pixel-based implementation of the algorithm is explored and its efficiency is compared with sequential pixel-based, sequential region-based, and parallel region-based implementations. Experimental results from on aircraft scanner data set are presented, as is a discussioon of proposed improvements to the segmentation algorithm.

  8. Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

    2005-01-01

    The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.

  9. User's Guide for ENSAERO_FE Parallel Finite Element Solver

    NASA Technical Reports Server (NTRS)

    Eldred, Lloyd B.; Guruswamy, Guru P.

    1999-01-01

    A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.

  10. Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall Jay

    1991-01-01

    The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.

  11. Increasing reconstruction quality of diffractive optical elements displayed with LC SLM

    NASA Astrophysics Data System (ADS)

    Cheremkhin, Pavel A.; Evtikhiev, Nikolay N.; Krasnov, Vitaly V.; Rodin, Vladislav G.; Starikov, Sergey N.

    2015-03-01

    Phase liquid crystal (LC) spatial light modulators (SLM) are actively used in various applications. However, majority of scientific applications require stable phase modulation which might be hard to achieve with commercially available SLM due to its consumer origin. The use of digital voltage addressing scheme leads to phase temporal fluctuations, which results in lower diffraction efficiency and reconstruction quality of displayed diffractive optical elements (DOE). Due to high periodicity of fluctuations it should be possible to use knowledge of these fluctuations during DOE synthesis to minimize negative effect. We synthesized DOE using accurately measured phase fluctuations of phase LC SLM "HoloEye PLUTO VIS" to minimize its negative impact on displayed DOE reconstruction. Synthesis was conducted with versatile direct search with random trajectory (DSRT) method in the following way. Before DOE synthesis begun, two-dimensional dependency of SLM phase shift on addressed signal level and time from frame start was obtained. Then synthesis begins. First, initial phase distribution is created. Second, random trajectory of consecutive processing of all DOE elements is generated. Then iterative process begins. Each DOE element sequentially has its value changed to one that provides better value of objective criterion, e.g. lower deviation of reconstructed image from original one. If current element value provides best objective criterion value then it left unchanged. After all elements are processed, iteration repeats until stagnation is reached. It is demonstrated that application of SLM phase fluctuations knowledge in DOE synthesis with DSRT method leads to noticeable increase of DOE reconstruction quality.

  12. Non-Cartesian Parallel Imaging Reconstruction

    PubMed Central

    Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

    2014-01-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499

  13. Automatic Synthesis of UML Designs from Requirements in an Iterative Process

    NASA Technical Reports Server (NTRS)

    Schumann, Johann; Whittle, Jon; Clancy, Daniel (Technical Monitor)

    2001-01-01

    The Unified Modeling Language (UML) is gaining wide popularity for the design of object-oriented systems. UML combines various object-oriented graphical design notations under one common framework. A major factor for the broad acceptance of UML is that it can be conveniently used in a highly iterative, Use Case (or scenario-based) process (although the process is not a part of UML). Here, the (pre-) requirements for the software are specified rather informally as Use Cases and a set of scenarios. A scenario can be seen as an individual trace of a software artifact. Besides first sketches of a class diagram to illustrate the static system breakdown, scenarios are a favorite way of communication with the customer, because scenarios describe concrete interactions between entities and are thus easy to understand. Scenarios with a high level of detail are often expressed as sequence diagrams. Later in the design and implementation stage (elaboration and implementation phases), a design of the system's behavior is often developed as a set of statecharts. From there (and the full-fledged class diagram), actual code development is started. Current commercial UML tools support this phase by providing code generators for class diagrams and statecharts. In practice, it can be observed that the transition from requirements to design to code is a highly iterative process. In this talk, a set of algorithms is presented which perform reasonable synthesis and transformations between different UML notations (sequence diagrams, Object Constraint Language (OCL) constraints, statecharts). More specifically, we will discuss the following transformations: Statechart synthesis, introduction of hierarchy, consistency of modifications, and "design-debugging".

  14. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

    2017-01-28

    Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

  15. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

    Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

  16. Virtual fringe projection system with nonparallel illumination based on iteration

    NASA Astrophysics Data System (ADS)

    Zhou, Duo; Wang, Zhangying; Gao, Nan; Zhang, Zonghua; Jiang, Xiangqian

    2017-06-01

    Fringe projection profilometry has been widely applied in many fields. To set up an ideal measuring system, a virtual fringe projection technique has been studied to assist in the design of hardware configurations. However, existing virtual fringe projection systems use parallel illumination and have a fixed optical framework. This paper presents a virtual fringe projection system with nonparallel illumination. Using an iterative method to calculate intersection points between rays and reference planes or object surfaces, the proposed system can simulate projected fringe patterns and captured images. A new explicit calibration method has been presented to validate the precision of the system. Simulated results indicate that the proposed iterative method outperforms previous systems. Our virtual system can be applied to error analysis, algorithm optimization, and help operators to find ideal system parameter settings for actual measurements.

  17. Variable aperture-based ptychographical iterative engine method

    NASA Astrophysics Data System (ADS)

    Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

    2018-02-01

    A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches.

  18. Accelerating spirocyclic polyketide synthesis using flow chemistry.

    PubMed

    Newton, Sean; Carter, Catherine F; Pearson, Colin M; de C Alves, Leandro; Lange, Heiko; Thansandote, Praew; Ley, Steven V

    2014-05-05

    Over the past decade, the integration of synthetic chemistry with flow processing has resulted in a powerful platform for molecular assembly that is making an impact throughout the chemical community. Herein, we demonstrate the extension of these tools to encompass complex natural product synthesis. We have developed a number of novel flow-through processes for reactions commonly encountered in natural product synthesis programs to achieve the first total synthesis of spirodienal A and the preparation of spirangien A methyl ester. Highlights of the synthetic route include an iridium-catalyzed hydrogenation, iterative Roush crotylations, gold-catalyzed spiroketalization and a late-stage cis-selective reduction. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Efficient ICCG on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Hammond, Steven W.; Schreiber, Robert

    1989-01-01

    Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.

  20. MPF: A portable message passing facility for shared memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

    1987-01-01

    The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.

  1. Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

    NASA Astrophysics Data System (ADS)

    Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

    2011-03-01

    A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.

  2. State Transition Matrix for Perturbed Orbital Motion Using Modified Chebyshev Picard Iteration

    NASA Astrophysics Data System (ADS)

    Read, Julie L.; Younes, Ahmad Bani; Macomber, Brent; Turner, James; Junkins, John L.

    2015-06-01

    The Modified Chebyshev Picard Iteration (MCPI) method has recently proven to be highly efficient for a given accuracy compared to several commonly adopted numerical integration methods, as a means to solve for perturbed orbital motion. This method utilizes Picard iteration, which generates a sequence of path approximations, and Chebyshev Polynomials, which are orthogonal and also enable both efficient and accurate function approximation. The nodes consistent with discrete Chebyshev orthogonality are generated using cosine sampling; this strategy also reduces the Runge effect and as a consequence of orthogonality, there is no matrix inversion required to find the basis function coefficients. The MCPI algorithms considered herein are parallel-structured so that they are immediately well-suited for massively parallel implementation with additional speedup. MCPI has a wide range of applications beyond ephemeris propagation, including the propagation of the State Transition Matrix (STM) for perturbed two-body motion. A solution is achieved for a spherical harmonic series representation of earth gravity (EGM2008), although the methodology is suitable for application to any gravity model. Included in this representation the normalized, Associated Legendre Functions are given and verified numerically. Modifications of the classical algorithm techniques, such as rewriting the STM equations in a second-order cascade formulation, gives rise to additional speedup. Timing results for the baseline formulation and this second-order formulation are given.

  3. Optimization and validation of accelerated golden-angle radial sparse MRI reconstruction with self-calibrating GRAPPA operator gridding.

    PubMed

    Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li

    2018-07-01

    Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  4. Analysis of surface wave propagation in a grounded dielectric slab covered by a resistive sheet

    NASA Technical Reports Server (NTRS)

    Shively, David G.

    1992-01-01

    Both parallel and perpendicular polarized surface waves are known to propagate on lossless and lossy grounded dielectric slabs. Surface wave propagation on a grounded dielectric slab covered with a resistive sheet is considered. Both parallel and perpendicular polarizations are examined. Transcendental equations are derived for each polarization and are solved using iterative techniques. Attenuation and phase velocity are shown for representative geometries. The results are applicable to both a grounded slab with a resistive sheet and an ungrounded slab covered on each side with a resistive sheet.

  5. Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saad, Yousef

    2014-01-16

    The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less

  6. SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.

    2014-12-01

    Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.

  7. Microwave-assisted one-pot synthesis of ring-fused aminals under catalyst- and solvent-free conditions

    EPA Science Inventory

    Heterocyclic compounds hold a special place in drug discovery and variety of techniques such as combinatorial synthesis, parallel synthesis, and automated library production to increase the output of these entities has been developed. Although most of these techniques are rapid a...

  8. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

    NASA Technical Reports Server (NTRS)

    Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

    1996-01-01

    We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.

  9. Variable aperture-based ptychographical iterative engine method.

    PubMed

    Sun, Aihui; Kong, Yan; Meng, Xin; He, Xiaoliang; Du, Ruijun; Jiang, Zhilong; Liu, Fei; Xue, Liang; Wang, Shouyu; Liu, Cheng

    2018-02-01

    A variable aperture-based ptychographical iterative engine (vaPIE) is demonstrated both numerically and experimentally to reconstruct the sample phase and amplitude rapidly. By adjusting the size of a tiny aperture under the illumination of a parallel light beam to change the illumination on the sample step by step and recording the corresponding diffraction patterns sequentially, both the sample phase and amplitude can be faithfully reconstructed with a modified ptychographical iterative engine (PIE) algorithm. Since many fewer diffraction patterns are required than in common PIE and the shape, the size, and the position of the aperture need not to be known exactly, this proposed vaPIE method remarkably reduces the data acquisition time and makes PIE less dependent on the mechanical accuracy of the translation stage; therefore, the proposed technique can be potentially applied for various scientific researches. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  10. On iterative processes in the Krylov-Sonneveld subspaces

    NASA Astrophysics Data System (ADS)

    Ilin, Valery P.

    2016-10-01

    The iterative Induced Dimension Reduction (IDR) methods are considered for solving large systems of linear algebraic equations (SLAEs) with nonsingular nonsymmetric matrices. These approaches are investigated by many authors and are charachterized sometimes as the alternative to the classical processes of Krylov type. The key moments of the IDR algorithms consist in the construction of the embedded Sonneveld subspaces, which have the decreasing dimensions and use the orthogonalization to some fixed subspace. Other independent approaches for research and optimization of the iterations are based on the augmented and modified Krylov subspaces by using the aggregation and deflation procedures with present various low rank approximations of the original matrices. The goal of this paper is to show, that IDR method in Sonneveld subspaces present an original interpretation of the modified algorithms in the Krylov subspaces. In particular, such description is given for the multi-preconditioned semi-conjugate direction methods which are actual for the parallel algebraic domain decomposition approaches.

  11. Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

    DOE PAGES

    Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

    2015-12-01

    We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less

  12. Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

    We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less

  13. Gauss-Seidel Iterative Method as a Real-Time Pile-Up Solver of Scintillation Pulses

    NASA Astrophysics Data System (ADS)

    Novak, Roman; Vencelj, Matja¿

    2009-12-01

    The pile-up rejection in nuclear spectroscopy has been confronted recently by several pile-up correction schemes that compensate for distortions of the signal and subsequent energy spectra artifacts as the counting rate increases. We study here a real-time capability of the event-by-event correction method, which at the core translates to solving many sets of linear equations. Tight time limits and constrained front-end electronics resources make well-known direct solvers inappropriate. We propose a novel approach based on the Gauss-Seidel iterative method, which turns out to be a stable and cost-efficient solution to improve spectroscopic resolution in the front-end electronics. We show the method convergence properties for a class of matrices that emerge in calorimetric processing of scintillation detector signals and demonstrate the ability of the method to support the relevant resolutions. The sole iteration-based error component can be brought below the sliding window induced errors in a reasonable number of iteration steps, thus allowing real-time operation. An area-efficient hardware implementation is proposed that fully utilizes the method's inherent parallelism.

  14. Iterated reaction graphs: simulating complex Maillard reaction pathways.

    PubMed

    Patel, S; Rabone, J; Russell, S; Tissen, J; Klaffke, W

    2001-01-01

    This study investigates a new method of simulating a complex chemical system including feedback loops and parallel reactions. The practical purpose of this approach is to model the actual reactions that take place in the Maillard process, a set of food browning reactions, in sufficient detail to be able to predict the volatile composition of the Maillard products. The developed framework, called iterated reaction graphs, consists of two main elements: a soup of molecules and a reaction base of Maillard reactions. An iterative process loops through the reaction base, taking reactants from and feeding products back to the soup. This produces a reaction graph, with molecules as nodes and reactions as arcs. The iterated reaction graph is updated and validated by comparing output with the main products found by classical gas-chromatographic/mass spectrometric analysis. To ensure a realistic output and convergence to desired volatiles only, the approach contains a number of novel elements: rate kinetics are treated as reaction probabilities; only a subset of the true chemistry is modeled; and the reactions are blocked into groups.

  15. Efficient parallel implicit methods for rotary-wing aerodynamics calculations

    NASA Astrophysics Data System (ADS)

    Wissink, Andrew M.

    Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.

  16. Use of general purpose graphics processing units with MODFLOW

    USGS Publications Warehouse

    Hughes, Joseph D.; White, Jeremy T.

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.

  17. Parallelization of a blind deconvolution algorithm

    NASA Astrophysics Data System (ADS)

    Matson, Charles L.; Borelli, Kathy J.

    2006-09-01

    Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

  18. Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

    PubMed

    Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

    1998-01-01

    An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 μmole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc.

  19. Density control in ITER: an iterative learning control and robust control approach

    NASA Astrophysics Data System (ADS)

    Ravensbergen, T.; de Vries, P. C.; Felici, F.; Blanken, T. C.; Nouailletas, R.; Zabeo, L.

    2018-01-01

    Plasma density control for next generation tokamaks, such as ITER, is challenging because of multiple reasons. The response of the usual gas valve actuators in future, larger fusion devices, might be too slow for feedback control. Both pellet fuelling and the use of feedforward-based control may help to solve this problem. Also, tight density limits arise during ramp-up, due to operational limits related to divertor detachment and radiative collapses. As the number of shots available for controller tuning will be limited in ITER, in this paper, iterative learning control (ILC) is proposed to determine optimal feedforward actuator inputs based on tracking errors, obtained in previous shots. This control method can take the actuator and density limits into account and can deal with large actuator delays. However, a purely feedforward-based density control may not be sufficient due to the presence of disturbances and shot-to-shot differences. Therefore, robust control synthesis is used to construct a robustly stabilizing feedback controller. In simulations, it is shown that this combined controller strategy is able to achieve good tracking performance in the presence of shot-to-shot differences, tight constraints, and model mismatches.

  20. Optimal Iterative Task Scheduling for Parallel Simulations.

    DTIC Science & Technology

    1991-03-01

    State University, Pullman, Washington. November 1976. 19. Grimaldi , Ralph P . Discrete and Combinatorial Mathematics. Addison-Wesley. June 1989. 20...2 4.8.1 Problem Description .. .. .. .. ... .. ... .... 4-25 4.8.2 Reasons for Level-Strate- p Failure. .. .. .. .. ... 4-26...f- I CA A* overview................................ C-1 C .2 Sample A* r......................... .... C-I C-3 Evaluation P

  1. Non-iterative Voltage Stability

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Makarov, Yuri V.; Vyakaranam, Bharat; Hou, Zhangshuan

    2014-09-30

    This report demonstrates promising capabilities and performance characteristics of the proposed method using several power systems models. The new method will help to develop a new generation of highly efficient tools suitable for real-time parallel implementation. The ultimate benefit obtained will be early detection of system instability and prevention of system blackouts in real time.

  2. Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cai, Xiao-Chuan; Keyes, David; Yang, Chao

    2014-09-29

    The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less

  3. Turbo Trellis Coded Modulation With Iterative Decoding for Mobile Satellite Communications

    NASA Technical Reports Server (NTRS)

    Divsalar, D.; Pollara, F.

    1997-01-01

    In this paper, analytical bounds on the performance of parallel concatenation of two codes, known as turbo codes, and serial concatenation of two codes over fading channels are obtained. Based on this analysis, design criteria for the selection of component trellis codes for MPSK modulation, and a suitable bit-by-bit iterative decoding structure are proposed. Examples are given for throughput of 2 bits/sec/Hz with 8PSK modulation. The parallel concatenation example uses two rate 4/5 8-state convolutional codes with two interleavers. The convolutional codes' outputs are then mapped to two 8PSK modulations. The serial concatenated code example uses an 8-state outer code with rate 4/5 and a 4-state inner trellis code with 5 inputs and 2 x 8PSK outputs per trellis branch. Based on the above mentioned design criteria for fading channels, a method to obtain he structure of the trellis code with maximum diversity is proposed. Simulation results are given for AWGN and an independent Rayleigh fading channel with perfect Channel State Information (CSI).

  4. HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.

    PubMed

    Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D

    2002-08-07

    Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.

  5. The novel implicit LU-SGS parallel iterative method based on the diffusion equation of a nuclear reactor on a GPU cluster

    NASA Astrophysics Data System (ADS)

    Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya

    2017-02-01

    GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.

  6. Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

    The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functionalmore » characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.« less

  7. Parallelization of the preconditioned IDR solver for modern multicore computer systems

    NASA Astrophysics Data System (ADS)

    Bessonov, O. A.; Fedoseyev, A. I.

    2012-10-01

    This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).

  8. Evaluation of the effects of patient arm attenuation in SPECT cardiac perfusion imaging

    NASA Astrophysics Data System (ADS)

    Luo, Dershan; King, M. A.; Pan, Tin-Su; Xia, Weishi

    1996-12-01

    It was hypothesized that the use of attenuation correction could compensate for degradation in the uniformity of apparent localization of imaging agents seen in cardiac walls when patients are imaged with arms at their sides. Noise-free simulations of the digital MCAT phantom were employed to investigate this hypothesis. Four variations in camera size and collimation scheme were investigated. We observed that: 1) without attenuation correction, the arms had little additional influences on the uniformity of the heart for 180/spl deg/ reconstructions and caused a small increase in nonuniformity for 360/spl deg/ reconstructions, where the impact of both arms was included; 2) change in patient size had more of an impact on count uniformity than the presence of the arms, either with or without attenuation correction; 3) for a low number of iterations and large patient size, slightly better uniformity was obtained from parallel emission data than from fan-beam emission data, independent of whether parallel or fan-beam transmission data was used to reconstruct the attenuation maps; and 4) for all camera configurations, uniformity was improved with attenuation correction and, given sufficient number of iterations, it was compatible among different imaging geometry combinations. Thus, iterative algorithms can compensate for the additional attenuation imposed by larger patients or having the arms on the sides. When the arms are at the sides of the patient, however, a larger radius of rotation may be required, resulting in decreased spatial resolution.

  9. Evaluation of the effects of patient arm attenuation in SPECT cardiac perfusion imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, D.; King, M.A.; Pan, T.S.

    1996-12-01

    It was hypothesized that the use of attenuation correction could compensate for degradation in the uniformity of apparent localization of imaging agents seen in cardiac walls when patients are imaged with arms at their sides. Noise-free simulations of the digital MCAT phantom were employed to investigate this hypothesis. Four variations in camera size and collimation scheme were investigated. The authors observed that: (1) without attenuation correction, the arms had little additional influences on the uniformity of the heart for 180{degree} reconstructions and caused a small increase in nonuniformity for 360{degree} reconstructions, where the impact of both arms was included; (2)more » change in patient size had more of an impact on count uniformity than the presence of the arms, either with or without attenuation correction; (3) for a low number of iterations and large patient size, slightly better uniformity was obtained from parallel emission data than from fan-beam emission data, independent of whether parallel or fan-beam transmission data was used to reconstruct the attenuation maps; and (4) for all camera configurations, uniformity was improved with attenuation correction and, given sufficient number of iterations, it was compatible among different imaging geometry combinations. Thus, iterative algorithms can compensate for the additional attenuation imposed by larger patients or having the arms on the sides. When the arms are at the sides of the patient, however, a larger radius of rotation may be required, resulting in decreased spatial resolution.« less

  10. Improving the iterative Linear Interaction Energy approach using automated recognition of configurational transitions.

    PubMed

    Vosmeer, C Ruben; Kooi, Derk P; Capoferri, Luigi; Terpstra, Margreet M; Vermeulen, Nico P E; Geerke, Daan P

    2016-01-01

    Recently an iterative method was proposed to enhance the accuracy and efficiency of ligand-protein binding affinity prediction through linear interaction energy (LIE) theory. For ligand binding to flexible Cytochrome P450s (CYPs), this method was shown to decrease the root-mean-square error and standard deviation of error prediction by combining interaction energies of simulations starting from different conformations. Thereby, different parts of protein-ligand conformational space are sampled in parallel simulations. The iterative LIE framework relies on the assumption that separate simulations explore different local parts of phase space, and do not show transitions to other parts of configurational space that are already covered in parallel simulations. In this work, a method is proposed to (automatically) detect such transitions during the simulations that are performed to construct LIE models and to predict binding affinities. Using noise-canceling techniques and splines to fit time series of the raw data for the interaction energies, transitions during simulation between different parts of phase space are identified. Boolean selection criteria are then applied to determine which parts of the interaction energy trajectories are to be used as input for the LIE calculations. Here we show that this filtering approach benefits the predictive quality of our previous CYP 2D6-aryloxypropanolamine LIE model. In addition, an analysis is performed of the gain in computational efficiency that can be obtained from monitoring simulations using the proposed filtering method and by prematurely terminating simulations accordingly.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

  12. A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson–Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chatterjee, Kausik, E-mail: kausik.chatterjee@aggiemail.usu.edu; Center for Atmospheric and Space Sciences, Utah State University, Logan, UT 84322; Roadcap, John R., E-mail: john.roadcap@us.af.mil

    The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson–Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals ofmore » the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.« less

  13. A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson-Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

    NASA Astrophysics Data System (ADS)

    Chatterjee, Kausik; Roadcap, John R.; Singh, Surendra

    2014-11-01

    The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson-Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals of the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.

  14. Parallel architectures for iterative methods on adaptive, block structured grids

    NASA Technical Reports Server (NTRS)

    Gannon, D.; Vanrosendale, J.

    1983-01-01

    A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.

  15. Predicting Silk Fiber Mechanical Properties through Multiscale Simulation and Protein Design.

    PubMed

    Rim, Nae-Gyune; Roberts, Erin G; Ebrahimi, Davoud; Dinjaski, Nina; Jacobsen, Matthew M; Martín-Moldes, Zaira; Buehler, Markus J; Kaplan, David L; Wong, Joyce Y

    2017-08-14

    Silk is a promising material for biomedical applications, and much research is focused on how application-specific, mechanical properties of silk can be designed synthetically through proper amino acid sequences and processing parameters. This protocol describes an iterative process between research disciplines that combines simulation, genetic synthesis, and fiber analysis to better design silk fibers with specific mechanical properties. Computational methods are used to assess the protein polymer structure as it forms an interconnected fiber network through shearing and how this process affects fiber mechanical properties. Model outcomes are validated experimentally with the genetic design of protein polymers that match the simulation structures, fiber fabrication from these polymers, and mechanical testing of these fibers. Through iterative feedback between computation, genetic synthesis, and fiber mechanical testing, this protocol will enable a priori prediction capability of recombinant material mechanical properties via insights from the resulting molecular architecture of the fiber network based entirely on the initial protein monomer composition. This style of protocol may be applied to other fields where a research team seeks to design a biomaterial with biomedical application-specific properties. This protocol highlights when and how the three research groups (simulation, synthesis, and engineering) should be interacting to arrive at the most effective method for predictive design of their material.

  16. Rapid and accurate synthesis of TALE genes from synthetic oligonucleotides.

    PubMed

    Wang, Fenghua; Zhang, Hefei; Gao, Jingxia; Chen, Fengjiao; Chen, Sijie; Zhang, Cuizhen; Peng, Gang

    2016-01-01

    Custom synthesis of transcription activator-like effector (TALE) genes has relied upon plasmid libraries of pre-fabricated TALE-repeat monomers or oligomers. Here we describe a novel synthesis method that directly incorporates annealed synthetic oligonucleotides into the TALE-repeat units. Our approach utilizes iterative sets of oligonucleotides and a translational frame check strategy to ensure the high efficiency and accuracy of TALE-gene synthesis. TALE arrays of more than 20 repeats can be constructed, and the majority of the synthesized constructs have perfect sequences. In addition, this novel oligonucleotide-based method can readily accommodate design changes to the TALE repeats. We demonstrated an increased gene targeting efficiency against a genomic site containing a potentially methylated cytosine by incorporating non-conventional repeat variable di-residue (RVD) sequences.

  17. An iterative method for the localization of a neutron source in a large box (container)

    NASA Astrophysics Data System (ADS)

    Dubinski, S.; Presler, O.; Alfassi, Z. B.

    2007-12-01

    The localization of an unknown neutron source in a bulky box was studied. This can be used for the inspection of cargo, to prevent the smuggling of neutron and α emitters. It is important to localize the source from the outside for safety reasons. Source localization is necessary in order to determine its activity. A previous study showed that, by using six detectors, three on each parallel face of the box (460×420×200 mm 3), the location of the source can be found with an average distance of 4.73 cm between the real source position and the calculated one and a maximal distance of about 9 cm. Accuracy was improved in this work by applying an iteration method based on four fixed detectors and the successive iteration of positioning of an external calibrating source. The initial positioning of the calibrating source is the plane of detectors 1 and 2. This method finds the unknown source location with an average distance of 0.78 cm between the real source position and the calculated one and a maximum distance of 3.66 cm for the same box. For larger boxes, localization without iterations requires an increase in the number of detectors, while localization with iterations requires only an increase in the number of iteration steps. In addition to source localization, two methods for determining the activity of the unknown source were also studied.

  18. Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations

    NASA Technical Reports Server (NTRS)

    Venkatakrishnan, V.

    1990-01-01

    The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.

  19. Domain decomposition methods in computational fluid dynamics

    NASA Technical Reports Server (NTRS)

    Gropp, William D.; Keyes, David E.

    1991-01-01

    The divide-and-conquer paradigm of iterative domain decomposition, or substructuring, has become a practical tool in computational fluid dynamic applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. These features are illustrated on the classic model problem of flow over a backstep using Newton's method as the nonlinear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately, and they can be combined synergistically. Sample performance results are included from an Intel iPSC/860 hypercube implementation.

  20. Enabling the High Level Synthesis of Data Analytics Accelerators

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino

    Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less

  1. Parallel approach for bioinspired algorithms

    NASA Astrophysics Data System (ADS)

    Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina

    2018-05-01

    In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.

  2. [Chlorophyll synthesis in cotyledons after gamma ray irradiation of black pine seeds].

    PubMed

    Bogdanović, M; Jelić, G

    1992-01-01

    The radiosensitivity of the greening system of Pinus nigra Arn. cotyledons has been studied in this paper. An exponential relation exists between the effect and dose for chlorophyll synthesis in the dark. Chlorophyll synthesis in the light roughly parallels that of chlorophyll synthesis in the dark. The restoration of chlorophyll was observed both in the light and in the dark. A stimulative effect of low doses of gamma radiation on chlorophyll synthesis was noticed. The radiosensitivity of chlorophyll a and chlorophyll b synthesis varied with the experimental conditions, suggesting that chlorophyll b synthesis might occur independently of chlorophyll a synthesis.

  3. ITER Central Solenoid Module Fabrication

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, John

    The fabrication of the modules for the ITER Central Solenoid (CS) has started in a dedicated production facility located in Poway, California, USA. The necessary tools have been designed, built, installed, and tested in the facility to enable the start of production. The current schedule has first module fabrication completed in 2017, followed by testing and subsequent shipment to ITER. The Central Solenoid is a key component of the ITER tokamak providing the inductive voltage to initiate and sustain the plasma current and to position and shape the plasma. The design of the CS has been a collaborative effort betweenmore » the US ITER Project Office (US ITER), the international ITER Organization (IO) and General Atomics (GA). GA’s responsibility includes: completing the fabrication design, developing and qualifying the fabrication processes and tools, and then completing the fabrication of the seven 110 tonne CS modules. The modules will be shipped separately to the ITER site, and then stacked and aligned in the Assembly Hall prior to insertion in the core of the ITER tokamak. A dedicated facility in Poway, California, USA has been established by GA to complete the fabrication of the seven modules. Infrastructure improvements included thick reinforced concrete floors, a diesel generator for backup power, along with, cranes for moving the tooling within the facility. The fabrication process for a single module requires approximately 22 months followed by five months of testing, which includes preliminary electrical testing followed by high current (48.5 kA) tests at 4.7K. The production of the seven modules is completed in a parallel fashion through ten process stations. The process stations have been designed and built with most stations having completed testing and qualification for carrying out the required fabrication processes. The final qualification step for each process station is achieved by the successful production of a prototype coil. Fabrication of the first ITER module is in progress. The seven modules will be individually shipped to Cadarache, France upon their completion. This paper describes the processes and status of the fabrication of the CS Modules for ITER.« less

  4. Parallel Symmetric Eigenvalue Problem Solvers

    DTIC Science & Technology

    2015-05-01

    get research, tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF...32 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 37 4.7 Relationship between...pencil. I will conclude with a discussion of the relationship between Trace- Min and simultaneous iteration. If both methods solve the linear systems

  5. Tokamak und Stellarator - zwei Wege zur Fusionsenergie: Fusionsforschung

    NASA Astrophysics Data System (ADS)

    Milch, Isabella

    2006-07-01

    Im Laufe der Fusionsforschung haben sich zwei Bautypen für ein zukünftiges Kraftwerk als besonders aussichtsreich erwiesen: Tokamak und Stellarator. Mit dem geplanten Tokamak-Experimentalreaktor ITER steht die internationale Fusionsforschung vor der Demonstration eines Energie liefernden Plasmas. Parallel soll die in Greifswald entstehende Forschungsanlage Wendelstein 7-X die Kraftwerkstauglichkeit des alternativen Bauprinzips der Stellaratoren zeigen.

  6. A fast iterative scheme for the linearized Boltzmann equation

    NASA Astrophysics Data System (ADS)

    Wu, Lei; Zhang, Jun; Liu, Haihu; Zhang, Yonghao; Reese, Jason M.

    2017-06-01

    Iterative schemes to find steady-state solutions to the Boltzmann equation are efficient for highly rarefied gas flows, but can be very slow to converge in the near-continuum flow regime. In this paper, a synthetic iterative scheme is developed to speed up the solution of the linearized Boltzmann equation by penalizing the collision operator L into the form L = (L + Nδh) - Nδh, where δ is the gas rarefaction parameter, h is the velocity distribution function, and N is a tuning parameter controlling the convergence rate. The velocity distribution function is first solved by the conventional iterative scheme, then it is corrected such that the macroscopic flow velocity is governed by a diffusion-type equation that is asymptotic-preserving into the Navier-Stokes limit. The efficiency of this new scheme is assessed by calculating the eigenvalue of the iteration, as well as solving for Poiseuille and thermal transpiration flows. We find that the fastest convergence of our synthetic scheme for the linearized Boltzmann equation is achieved when Nδ is close to the average collision frequency. The synthetic iterative scheme is significantly faster than the conventional iterative scheme in both the transition and the near-continuum gas flow regimes. Moreover, due to its asymptotic-preserving properties, the synthetic iterative scheme does not need high spatial resolution in the near-continuum flow regime, which makes it even faster than the conventional iterative scheme. Using this synthetic scheme, with the fast spectral approximation of the linearized Boltzmann collision operator, Poiseuille and thermal transpiration flows between two parallel plates, through channels of circular/rectangular cross sections and various porous media are calculated over the whole range of gas rarefaction. Finally, the flow of a Ne-Ar gas mixture is solved based on the linearized Boltzmann equation with the Lennard-Jones intermolecular potential for the first time, and the difference between these results and those using the hard-sphere potential is discussed.

  7. Multi-GPU Acceleration of Branchless Distance Driven Projection and Backprojection for Clinical Helical CT.

    PubMed

    Mitra, Ayan; Politte, David G; Whiting, Bruce R; Williamson, Jeffrey F; O'Sullivan, Joseph A

    2017-01-01

    Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.

  8. Fabrication of heterogeneous nanomaterial array by programmable heating and chemical supply within microfluidic platform towards multiplexed gas sensing application

    PubMed Central

    Yang, Daejong; Kang, Kyungnam; Kim, Donghwan; Li, Zhiyong; Park, Inkyu

    2015-01-01

    A facile top-down/bottom-up hybrid nanofabrication process based on programmable temperature control and parallel chemical supply within microfluidic platform has been developed for the all liquid-phase synthesis of heterogeneous nanomaterial arrays. The synthesized materials and locations can be controlled by local heating with integrated microheaters and guided liquid chemical flow within microfluidic platform. As proofs-of-concept, we have demonstrated the synthesis of two types of nanomaterial arrays: (i) parallel array of TiO2 nanotubes, CuO nanospikes and ZnO nanowires, and (ii) parallel array of ZnO nanowire/CuO nanospike hybrid nanostructures, CuO nanospikes and ZnO nanowires. The laminar flow with negligible ionic diffusion between different precursor solutions as well as localized heating was verified by numerical calculation and experimental result of nanomaterial array synthesis. The devices made of heterogeneous nanomaterial array were utilized as a multiplexed sensor for toxic gases such as NO2 and CO. This method would be very useful for the facile fabrication of functional nanodevices based on highly integrated arrays of heterogeneous nanomaterials. PMID:25634814

  9. Response to Intervention: "Lore v. Law"

    ERIC Educational Resources Information Center

    Zirkel, Perry A.

    2018-01-01

    The legal dimension of response to intervention (RTI) has been the subject of considerable professional confusion. This brief article addresses the issue in three parts. The first part provides an update of a previous iteration that compared 12 common conceptions, referred to here as the "lore," with an objective synthesis of the…

  10. Iterative divergent/convergent doubling approach to linear conjugated oligomers. A rapid route to a 128 A long potential molecular wire

    NASA Astrophysics Data System (ADS)

    Tour, James M.; Schumm, Jeffrey S.; Pearson, Darren L.

    1994-06-01

    Described is the synthesis of oligo (2-ethylphenylene ethynylene)s and oligo (2-(3'ethylheptyl) phenylene ethynylene)s via an iterative divergent convergent approach. Synthesized were the monomer, dimer, tetramer, and octamer of the ethyl derivative and the monomer, dimer, tetramer, octamer, and 16-mer of the ethylheptyl derivative. The 16-mer is 128 A long. At each stage in the iteration, the length of the framework doubles. Only three sets of reaction conditions are needed for the entire iterative synthetic sequence; an iodination, a protodesilylation, and a Pd/Cu-catalyzed cross coupling. The oligomers were characterized spectroscopically and by mass spectrometry. The optical properties are presented which show the stage of optical absorbance saturation. The size exclusion chromatography values for the number average weights, relative to polystyrene, illustrate the tremendous differences in the hydrodynamic volume of these rigid rod oligomers verses the random coils of polystyrene. These differences become quite apparent at the octamer stage. These oligomers may act as molecular wires in molecular electronic devices and they also serve as useful models for understanding related bulk polymers.

  11. Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix

    NASA Astrophysics Data System (ADS)

    Gan, Chee Kwan; Challacombe, Matt

    2003-05-01

    Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.

  12. Solution-Phase Synthesis of Dipeptides: A Capstone Project That Employs Key Techniques in an Organic Laboratory Course

    ERIC Educational Resources Information Center

    Marchetti, Louis; DeBoef, Brenton

    2015-01-01

    A contemporary approach to the synthesis and purification of several UV-active dipeptides has been developed for the second-year organic laboratory. This experiment exposes students to the important technique of solution-phase peptide synthesis and allows an instructor to highlight the parallel between what they are accomplishing in the laboratory…

  13. Efficient synthesis of gamma-lactams by a tandem reductive amination/lactamization sequence.

    PubMed

    Nöth, Julica; Frankowski, Kevin J; Neuenswander, Benjamin; Aubé, Jeffrey; Reiser, Oliver

    2008-01-01

    A three-component method for the synthesis of highly substituted gamma-lactams from readily available maleimides, aldehydes, and amines is described. A new reductive amination/intramolecular lactamization sequence provides a straightforward route to the lactam products in a single manipulation. The general utility of this method is demonstrated by the parallel synthesis of a gamma-lactam library.

  14. Iterative Refinement of a Binding Pocket Model: Active Computational Steering of Lead Optimization

    PubMed Central

    2012-01-01

    Computational approaches for binding affinity prediction are most frequently demonstrated through cross-validation within a series of molecules or through performance shown on a blinded test set. Here, we show how such a system performs in an iterative, temporal lead optimization exercise. A series of gyrase inhibitors with known synthetic order formed the set of molecules that could be selected for “synthesis.” Beginning with a small number of molecules, based only on structures and activities, a model was constructed. Compound selection was done computationally, each time making five selections based on confident predictions of high activity and five selections based on a quantitative measure of three-dimensional structural novelty. Compound selection was followed by model refinement using the new data. Iterative computational candidate selection produced rapid improvements in selected compound activity, and incorporation of explicitly novel compounds uncovered much more diverse active inhibitors than strategies lacking active novelty selection. PMID:23046104

  15. Radar cross-section reduction based on an iterative fast Fourier transform optimized metasurface

    NASA Astrophysics Data System (ADS)

    Song, Yi-Chuan; Ding, Jun; Guo, Chen-Jiang; Ren, Yu-Hui; Zhang, Jia-Kai

    2016-07-01

    A novel polarization insensitive metasurface with over 25 dB monostatic radar cross-section (RCS) reduction is introduced. The proposed metasurface is comprised of carefully arranged unit cells with spatially varied dimension, which enables approximate uniform diffusion of incoming electromagnetic (EM) energy and reduces the threat from bistatic radar system. An iterative fast Fourier transform (FFT) method for conventional antenna array pattern synthesis is innovatively applied to find the best unit cell geometry parameter arrangement. Finally, a metasurface sample is fabricated and tested to validate RCS reduction behavior predicted by full wave simulation software Ansys HFSSTM and marvelous agreement is observed.

  16. Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation

    NASA Astrophysics Data System (ADS)

    Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia

    2013-08-01

    Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.

  17. Region of interest processing for iterative reconstruction in x-ray computed tomography

    NASA Astrophysics Data System (ADS)

    Kopp, Felix K.; Nasirudin, Radin A.; Mei, Kai; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Noël, Peter B.

    2015-03-01

    The recent advancements in the graphics card technology raised the performance of parallel computing and contributed to the introduction of iterative reconstruction methods for x-ray computed tomography in clinical CT scanners. Iterative maximum likelihood (ML) based reconstruction methods are known to reduce image noise and to improve the diagnostic quality of low-dose CT. However, iterative reconstruction of a region of interest (ROI), especially ML based, is challenging. But for some clinical procedures, like cardiac CT, only a ROI is needed for diagnostics. A high-resolution reconstruction of the full field of view (FOV) consumes unnecessary computation effort that results in a slower reconstruction than clinically acceptable. In this work, we present an extension and evaluation of an existing ROI processing algorithm. Especially improvements for the equalization between regions inside and outside of a ROI are proposed. The evaluation was done on data collected from a clinical CT scanner. The performance of the different algorithms is qualitatively and quantitatively assessed. Our solution to the ROI problem provides an increase in signal-to-noise ratio and leads to visually less noise in the final reconstruction. The reconstruction speed of our technique was observed to be comparable with other previous proposed techniques. The development of ROI processing algorithms in combination with iterative reconstruction will provide higher diagnostic quality in the near future.

  18. Reducing Design Cycle Time and Cost Through Process Resequencing

    NASA Technical Reports Server (NTRS)

    Rogers, James L.

    2004-01-01

    In today's competitive environment, companies are under enormous pressure to reduce the time and cost of their design cycle. One method for reducing both time and cost is to develop an understanding of the flow of the design processes and the effects of the iterative subcycles that are found in complex design projects. Once these aspects are understood, the design manager can make decisions that take advantage of decomposition, concurrent engineering, and parallel processing techniques to reduce the total time and the total cost of the design cycle. One software tool that can aid in this decision-making process is the Design Manager's Aid for Intelligent Decomposition (DeMAID). The DeMAID software minimizes the feedback couplings that create iterative subcycles, groups processes into iterative subcycles, and decomposes the subcycles into a hierarchical structure. The real benefits of producing the best design in the least time and at a minimum cost are obtained from sequencing the processes in the subcycles.

  19. On the inversion of geodetic integrals defined over the sphere using 1-D FFT

    NASA Astrophysics Data System (ADS)

    García, R. V.; Alejo, C. A.

    2005-08-01

    An iterative method is presented which performs inversion of integrals defined over the sphere. The method is based on one-dimensional fast Fourier transform (1-D FFT) inversion and is implemented with the projected Landweber technique, which is used to solve constrained least-squares problems reducing the associated 1-D cyclic-convolution error. The results obtained are as precise as the direct matrix inversion approach, but with better computational efficiency. A case study uses the inversion of Hotine’s integral to obtain gravity disturbances from geoid undulations. Numerical convergence is also analyzed and comparisons with respect to the direct matrix inversion method using conjugate gradient (CG) iteration are presented. Like the CG method, the number of iterations needed to get the optimum (i.e., small) error decreases as the measurement noise increases. Nevertheless, for discrete data given over a whole parallel band, the method can be applied directly without implementing the projected Landweber method, since no cyclic convolution error exists.

  20. Synthesis of Efficient Structures for Concurrent Computation.

    DTIC Science & Technology

    1983-10-01

    formal presentation of these techniques, called virtualisation and aggregation, can be found n [King-83$. 113.2 Census Functions Trees perform broadcast... Functions .. .. .. .. ... .... ... ... .... ... ... ....... 6 4 User-Assisted Aggregation .. .. .. .. ... ... ... .... ... .. .......... 6 5 Parallel...6. Simple Parallel Structure for Broadcasting .. .. .. .. .. . ... .. . .. . .... 4 Figure 7. Internal Structure of a Prefix Computation Network

  1. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.

    PubMed

    Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T

    2017-01-01

    Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.

  2. Computer architecture evaluation for structural dynamics computations: Project summary

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  3. Solutions of large-scale electromagnetics problems involving dielectric objects with the parallel multilevel fast multipole algorithm.

    PubMed

    Ergül, Özgür

    2011-11-01

    Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.

  4. Solution of partial differential equations on vector and parallel computers

    NASA Technical Reports Server (NTRS)

    Ortega, J. M.; Voigt, R. G.

    1985-01-01

    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.

  5. The revised solar array synthesis computer program

    NASA Technical Reports Server (NTRS)

    1970-01-01

    The Revised Solar Array Synthesis Computer Program is described. It is a general-purpose program which computes solar array output characteristics while accounting for the effects of temperature, incidence angle, charged-particle irradiation, and other degradation effects on various solar array configurations in either circular or elliptical orbits. Array configurations may consist of up to 75 solar cell panels arranged in any series-parallel combination not exceeding three series-connected panels in a parallel string and no more than 25 parallel strings in an array. Up to 100 separate solar array current-voltage characteristics, corresponding to 100 equal-time increments during the sunlight illuminated portion of an orbit or any 100 user-specified combinations of incidence angle and temperature, can be computed and printed out during one complete computer execution. Individual panel incidence angles may be computed and printed out at the user's option.

  6. Scientific and technical challenges on the road towards fusion electricity

    NASA Astrophysics Data System (ADS)

    Donné, A. J. H.; Federici, G.; Litaudon, X.; McDonald, D. C.

    2017-10-01

    The goal of the European Fusion Roadmap is to deliver fusion electricity to the grid early in the second half of this century. It breaks the quest for fusion energy into eight missions, and for each of them it describes a research and development programme to address all the open technical gaps in physics and technology and estimates the required resources. It points out the needs to intensify industrial involvement and to seek all opportunities for collaboration outside Europe. The roadmap covers three periods: the short term, which runs parallel to the European Research Framework Programme Horizon 2020, the medium term and the long term. ITER is the key facility of the roadmap as it is expected to achieve most of the important milestones on the path to fusion power. Thus, the vast majority of present resources are dedicated to ITER and its accompanying experiments. The medium term is focussed on taking ITER into operation and bringing it to full power, as well as on preparing the construction of a demonstration power plant DEMO, which will for the first time demonstrate fusion electricity to the grid around the middle of this century. Building and operating DEMO is the subject of the last roadmap phase: the long term. Clearly, the Fusion Roadmap is tightly connected to the ITER schedule. Three key milestones are the first operation of ITER, the start of the DT operation in ITER and reaching the full performance at which the thermal fusion power is 10 times the power put in to the plasma. The Engineering Design Activity of DEMO needs to start a few years after the first ITER plasma, while the start of the construction phase will be a few years after ITER reaches full performance. In this way ITER can give viable input to the design and development of DEMO. Because the neutron fluence in DEMO will be much higher than in ITER, it is important to develop and validate materials that can handle these very high neutron loads. For the testing of the materials, a dedicated 14 MeV neutron source is needed. This DEMO Oriented Neutron Source (DONES) is therefore an important facility to support the fusion roadmap.

  7. Nonperturbative measurement of the local magnetic field using pulsed polarimetry for fusion reactor conditions (invited).

    PubMed

    Smith, Roger J

    2008-10-01

    A novel diagnostic technique for the remote and nonperturbative sensing of the local magnetic field in reactor relevant plasmas is presented. Pulsed polarimetry [Patent No. 12/150,169 (pending)] combines optical scattering with the Faraday effect. The polarimetric light detection and ranging (LIDAR)-like diagnostic has the potential to be a local B(pol) diagnostic on ITER and can achieve spatial resolutions of millimeters on high energy density (HED) plasmas using existing lasers. The pulsed polarimetry method is based on nonlocal measurements and subtle effects are introduced that are not present in either cw polarimetry or Thomson scattering LIDAR. Important features include the capability of simultaneously measuring local T(e), n(e), and B(parallel) along the line of sight, a resiliency to refractive effects, a short measurement duration providing near instantaneous data in time, and location for real-time feedback and control of magnetohydrodynamic (MHD) instabilities and the realization of a widely applicable internal magnetic field diagnostic for the magnetic fusion energy program. The technique improves for higher n(e)B(parallel) product and higher n(e) and is well suited for diagnosing the transient plasmas in the HED program. Larger devices such as ITER and DEMO are also better suited to the technique, allowing longer pulse lengths and thereby relaxing key technology constraints making pulsed polarimetry a valuable asset for next step devices. The pulsed polarimetry technique is clarified by way of illustration on the ITER tokamak and plasmas within the magnetized target fusion program within present technological means.

  8. An improved parallel fuzzy connected image segmentation method based on CUDA.

    PubMed

    Wang, Liansheng; Li, Dong; Huang, Shaohui

    2016-05-12

    Fuzzy connectedness method (FC) is an effective method for extracting fuzzy objects from medical images. However, when FC is applied to large medical image datasets, its running time will be greatly expensive. Therefore, a parallel CUDA version of FC (CUDA-kFOE) was proposed by Ying et al. to accelerate the original FC. Unfortunately, CUDA-kFOE does not consider the edges between GPU blocks, which causes miscalculation of edge points. In this paper, an improved algorithm is proposed by adding a correction step on the edge points. The improved algorithm can greatly enhance the calculation accuracy. In the improved method, an iterative manner is applied. In the first iteration, the affinity computation strategy is changed and a look up table is employed for memory reduction. In the second iteration, the error voxels because of asynchronism are updated again. Three different CT sequences of hepatic vascular with different sizes were used in the experiments with three different seeds. NVIDIA Tesla C2075 is used to evaluate our improved method over these three data sets. Experimental results show that the improved algorithm can achieve a faster segmentation compared to the CPU version and higher accuracy than CUDA-kFOE. The calculation results were consistent with the CPU version, which demonstrates that it corrects the edge point calculation error of the original CUDA-kFOE. The proposed method has a comparable time cost and has less errors compared to the original CUDA-kFOE as demonstrated in the experimental results. In the future, we will focus on automatic acquisition method and automatic processing.

  9. Optimal parallel solution of sparse triangular systems

    NASA Technical Reports Server (NTRS)

    Alvarado, Fernando L.; Schreiber, Robert

    1990-01-01

    A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.

  10. Computer program MCAP-TOSS calculates steady-state fluid dynamics of coolant in parallel channels and temperature distribution in surrounding heat-generating solid

    NASA Technical Reports Server (NTRS)

    Lee, A. Y.

    1967-01-01

    Computer program calculates the steady state fluid distribution, temperature rise, and pressure drop of a coolant, the material temperature distribution of a heat generating solid, and the heat flux distributions at the fluid-solid interfaces. It performs the necessary iterations automatically within the computer, in one machine run.

  11. On some Aitken-like acceleration of the Schwarz method

    NASA Astrophysics Data System (ADS)

    Garbey, M.; Tromeur-Dervout, D.

    2002-12-01

    In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.

  12. Development of iterative techniques for the solution of unsteady compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Hixon, Duane; Sankar, L. N.

    1993-01-01

    During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.

  13. Optimizing transformations of stencil operations for parallel cache-based architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bassetti, F.; Davis, K.

    This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less

  14. Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.

    PubMed

    Slażyński, Leszek; Bohte, Sander

    2012-01-01

    The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.

  15. Towards the Experimental Assessment of the DQE in SPECT Scanners

    NASA Astrophysics Data System (ADS)

    Fountos, G. P.; Michail, C. M.

    2017-11-01

    The purpose of this work was to introduce the Detective Quantum Efficiency (DQE) in single photon emission computed tomography (SPECT) systems using a flood source. A Tc-99m-based flood source (Eγ = 140 keV) consisting of a radiopharmaceutical solution of dithiothreitol (DTT, 10-3 M)/Tc-99m(III)-DMSA, 40 mCi/40 ml bound to the grains of an Agfa MammoRay HDR Medical X-ray film) was prepared in laboratory. The source was placed between two PMMA blocks and images were obtained by using the brain tomographic acquisition protocol (DatScan-brain). The Modulation Transfer Function (MTF) was evaluated using the Iterative 2D algorithm. All imaging experiments were performed in a Siemens e-Cam gamma camera. The Normalized Noise Power spectra (NNPS) were obtained from the sagittal views of the source. The higher MTF values were obtained for the Flash Iterative 2D with 24 iterations and 20 subsets. The noise levels of the SPECT reconstructed images, in terms of the NNPS, were found to increase as the number of iterations increase. The behavior of the DQE was influenced by both MTF and NNPS. As the number of iterations was increased, higher MTF values were obtained, however with a parallel, increase of magnitude in image noise, as depicted from the NNPS results. DQE values, which were influenced by both MTF and NNPS, were found higher when the number of iterations results in resolution saturation. The method presented here is novel and easy to implement, requiring materials commonly found in clinical practice and can be useful in the quality control of SPECT scanners.

  16. Development of a stereo analysis algorithm for generating topographic maps using interactive techniques of the MPP

    NASA Technical Reports Server (NTRS)

    Strong, James P.

    1987-01-01

    A local area matching algorithm was developed on the Massively Parallel Processor (MPP). It is an iterative technique that first matches coarse or low resolution areas and at each iteration performs matches of higher resolution. Results so far show that when good matches are possible in the two images, the MPP algorithm matches corresponding areas as well as a human observer. To aid in developing this algorithm, a control or shell program was developed for the MPP that allows interactive experimentation with various parameters and procedures to be used in the matching process. (This would not be possible without the high speed of the MPP). With the system, optimal techniques can be developed for different types of matching problems.

  17. The effect of pre-existing islands on disruption mitigation in MHD simulations of DIII-D

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Izzo, V. A.

    Locked-modes are the most likely cause of disruptions in ITER, so large islands are expected to be common when the ITER disruption mitigation system is deployed. MHD modeling of disruption mitigation by massive gas injection is carried out for DIII-D plasmas with stationary, pre-existing islands. Results show that the magnetic topology at the q=2 surface can affect the parallel spreading of injected impurities, and that, in particular, the break-up of large 2/1 islands into smaller 4/2 islands chains can favorably affect mitigation metrics. The direct imposition of a 4/2 mode is found to have similar results to the case inmore » which the 4/2 harmonic grows spontaneously.« less

  18. The effect of pre-existing islands on disruption mitigation in MHD simulations of DIII-D

    DOE PAGES

    Izzo, V. A.

    2017-02-27

    Locked-modes are the most likely cause of disruptions in ITER, so large islands are expected to be common when the ITER disruption mitigation system is deployed. MHD modeling of disruption mitigation by massive gas injection is carried out for DIII-D plasmas with stationary, pre-existing islands. Results show that the magnetic topology at the q=2 surface can affect the parallel spreading of injected impurities, and that, in particular, the break-up of large 2/1 islands into smaller 4/2 islands chains can favorably affect mitigation metrics. The direct imposition of a 4/2 mode is found to have similar results to the case inmore » which the 4/2 harmonic grows spontaneously.« less

  19. Projection matrix acquisition for cone-beam computed tomography iterative reconstruction

    NASA Astrophysics Data System (ADS)

    Yang, Fuqiang; Zhang, Dinghua; Huang, Kuidong; Shi, Wenlong; Zhang, Caixin; Gao, Zongzhao

    2017-02-01

    Projection matrix is an essential and time-consuming part in computed tomography (CT) iterative reconstruction. In this article a novel calculation algorithm of three-dimensional (3D) projection matrix is proposed to quickly acquire the matrix for cone-beam CT (CBCT). The CT data needed to be reconstructed is considered as consisting of the three orthogonal sets of equally spaced and parallel planes, rather than the individual voxels. After getting the intersections the rays with the surfaces of the voxels, the coordinate points and vertex is compared to obtain the index value that the ray traversed. Without considering ray-slope to voxel, it just need comparing the position of two points. Finally, the computer simulation is used to verify the effectiveness of the algorithm.

  20. A conjugate gradient method for solving the non-LTE line radiation transfer problem

    NASA Astrophysics Data System (ADS)

    Paletou, F.; Anterrieu, E.

    2009-12-01

    This study concerns the fast and accurate solution of the line radiation transfer problem, under non-LTE conditions. We propose and evaluate an alternative iterative scheme to the classical ALI-Jacobi method, and to the more recently proposed Gauss-Seidel and successive over-relaxation (GS/SOR) schemes. Our study is indeed based on applying a preconditioned bi-conjugate gradient method (BiCG-P). Standard tests, in 1D plane parallel geometry and in the frame of the two-level atom model with monochromatic scattering are discussed. Rates of convergence between the previously mentioned iterative schemes are compared, as are their respective timing properties. The smoothing capability of the BiCG-P method is also demonstrated.

  1. Block-Parallel Data Analysis with DIY2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morozov, Dmitriy; Peterka, Tom

    DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less

  2. Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

    A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less

  3. Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

    DOE PAGES

    Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

    2017-11-14

    A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less

  4. Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

    NASA Astrophysics Data System (ADS)

    Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

    2017-11-01

    A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.

  5. S-HARP: A parallel dynamic spectral partitioner

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sohn, A.; Simon, H.

    1998-01-01

    Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less

  6. Massively Parallel Nanostructure Assembly Strategies for Sensing and Information Technology. Phase 2

    DTIC Science & Technology

    2013-05-25

    field. This work has focused on the synthesis of new functional materials and the development of high-throughput, facile methods to assemble...Hong (Seoul National University, Korea). Specifically, gapped nanowires (GNW) were identified as candidate materials for synthesis and assembly as...Throughout the course of this grant, we reported major accomplishments both in the synthesis and assembly of such structures. Synthetically, we report three

  7. Python based high-level synthesis compiler

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radosław; Pozniak, Krzysztof; Romaniuk, Ryszard

    2014-11-01

    This paper presents a python based High-Level synthesis (HLS) compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and map it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This article describes design, implementation and first results of created Python based compiler.

  8. Development of iterative techniques for the solution of unsteady compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Sankar, Lakshmi; Hixon, Duane

    1993-01-01

    The work done under this project was documented in detail as the Ph. D. dissertation of Dr. Duane Hixon. The objectives of the research project were evaluation of the generalized minimum residual method (GMRES) as a tool for accelerating 2-D and 3-D unsteady flows and evaluation of the suitability of the GMRES algorithm for unsteady flows, computed on parallel computer architectures.

  9. Methods for design and evaluation of integrated hardware-software systems for concurrent computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.

  10. Discovery of Antibiotics-derived Polymers for Gene Delivery using Combinatorial Synthesis and Cheminformatics Modeling

    PubMed Central

    Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal

    2014-01-01

    We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709

  11. Burning plasma regime for Fussion-Fission Research Facility

    NASA Astrophysics Data System (ADS)

    Zakharov, Leonid E.

    2010-11-01

    The basic aspects of burning plasma regimes of Fusion-Fission Research Facility (FFRF, R/a=4/1 m/m, Ipl=5 MA, Btor=4-6 T, P^DT=50-100 MW, P^fission=80-4000 MW, 1 m thick blanket), which is suggested as the next step device for Chinese fusion program, are presented. The mission of FFRF is to advance magnetic fusion to the level of a stationary neutron source and to create a technical, scientific, and technology basis for the utilization of high-energy fusion neutrons for the needs of nuclear energy and technology. FFRF will rely as much as possible on ITER design. Thus, the magnetic system, especially TFC, will take advantage of ITER experience. TFC will use the same superconductor as ITER. The plasma regimes will represent an extension of the stationary plasma regimes on HT-7 and EAST tokamaks at ASIPP. Both inductive discharges and stationary non-inductive Lower Hybrid Current Drive (LHCD) will be possible. FFRF strongly relies on new, Lithium Wall Fusion (LiWF) plasma regimes, the development of which will be done on NSTX, HT-7, EAST in parallel with the design work. This regime will eliminate a number of uncertainties, still remaining unresolved in the ITER project. Well controlled, hours long inductive current drive operation at P^DT=50-100 MW is predicted.

  12. Functional materials for breeding blankets—status and developments

    NASA Astrophysics Data System (ADS)

    Konishi, S.; Enoeda, M.; Nakamichi, M.; Hoshino, T.; Ying, A.; Sharafat, S.; Smolentsev, S.

    2017-09-01

    The development of tritium breeder, neutron multiplier and flow channel insert materials for the breeding blanket of the DEMO reactor is reviewed. Present emphasis is on the ITER test blanket module (TBM); lithium metatitanate (Li2TiO3) and lithium orthosilicate (Li4SiO4) pebbles have been developed by leading TBM parties. Beryllium pebbles have been selected as the neutron multiplier. Good progress has been made in their fabrication; however, verification of the design by experiments is in the planning stage. Irradiation data are also limited, but the decrease in thermal conductivity of beryllium due to irradiation followed by swelling is a concern. Tests at ITER are regarded as a major milestone. For the DEMO reactor, improvement of the breeder has been attempted to obtain a higher lithium content, and Be12Ti and other beryllide intermetallic compounds that have superior chemical stability have been studied. LiPb eutectic has been considered as a DEMO blanket in the liquid breeder option and is used as a coolant to achieve a higher outlet temperature; a SiC flow channel insert is used to prevent magnetohydrodynamic pressure drop and corrosion. A significant technical gap between ITER TBM and DEMO is recognized, and the world fusion community is working on ITER TBM and DEMO blanket development in parallel.

  13. External heating and current drive source requirements towards steady-state operation in ITER

    NASA Astrophysics Data System (ADS)

    Poli, F. M.; Kessel, C. E.; Bonoli, P. T.; Batchelor, D. B.; Harvey, R. W.; Snyder, P. B.

    2014-07-01

    Steady state scenarios envisaged for ITER aim at optimizing the bootstrap current, while maintaining sufficient confinement and stability to provide the necessary fusion yield. Non-inductive scenarios will need to operate with internal transport barriers (ITBs) in order to reach adequate fusion gain at typical currents of 9 MA. However, the large pressure gradients associated with ITBs in regions of weak or negative magnetic shear can be conducive to ideal MHD instabilities, reducing the no-wall limit. The E × B flow shear from toroidal plasma rotation is expected to be low in ITER, with a major role in the ITB dynamics being played by magnetic geometry. Combinations of heating and current drive (H/CD) sources that sustain reversed magnetic shear profiles throughout the discharge are the focus of this work. Time-dependent transport simulations indicate that a combination of electron cyclotron (EC) and lower hybrid (LH) waves is a promising route towards steady state operation in ITER. The LH forms and sustains expanded barriers and the EC deposition at mid-radius freezes the bootstrap current profile stabilizing the barrier and leading to confinement levels 50% higher than typical H-mode energy confinement times. Using LH spectra with spectrum centred on parallel refractive index of 1.75-1.85, the performance of these plasma scenarios is close to the ITER target of 9 MA non-inductive current, global confinement gain H98 = 1.6 and fusion gain Q = 5.

  14. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less

  15. Krylov subspace methods on supercomputers

    NASA Technical Reports Server (NTRS)

    Saad, Youcef

    1988-01-01

    A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.

  16. Sequential and parallel image restoration: neural network implementations.

    PubMed

    Figueiredo, M T; Leitao, J N

    1994-01-01

    Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.

  17. Reducing neural network training time with parallel processing

    NASA Technical Reports Server (NTRS)

    Rogers, James L., Jr.; Lamarsh, William J., II

    1995-01-01

    Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.

  18. Writing and compiling code into biochemistry.

    PubMed

    Shea, Adam; Fett, Brian; Riedel, Marc D; Parhi, Keshab

    2010-01-01

    This paper presents a methodology for translating iterative arithmetic computation, specified as high-level programming constructs, into biochemical reactions. From an input/output specification, we generate biochemical reactions that produce output quantities of proteins as a function of input quantities performing operations such as addition, subtraction, and scalar multiplication. Iterative constructs such as "while" loops and "for" loops are implemented by transferring quantities between protein types, based on a clocking mechanism. Synthesis first is performed at a conceptual level, in terms of abstract biochemical reactions - a task analogous to high-level program compilation. Then the results are mapped onto specific biochemical reactions selected from libraries - a task analogous to machine language compilation. We demonstrate our approach through the compilation of a variety of standard iterative functions: multiplication, exponentiation, discrete logarithms, raising to a power, and linear transforms on time series. The designs are validated through transient stochastic simulation of the chemical kinetics. We are exploring DNA-based computation via strand displacement as a possible experimental chassis.

  19. Iterative reactions of transient boronic acids enable sequential C-C bond formation

    NASA Astrophysics Data System (ADS)

    Battilocchio, Claudio; Feist, Florian; Hafner, Andreas; Simon, Meike; Tran, Duc N.; Allwood, Daniel M.; Blakemore, David C.; Ley, Steven V.

    2016-04-01

    The ability to form multiple carbon-carbon bonds in a controlled sequence and thus rapidly build molecular complexity in an iterative fashion is an important goal in modern chemical synthesis. In recent times, transition-metal-catalysed coupling reactions have dominated in the development of C-C bond forming processes. A desire to reduce the reliance on precious metals and a need to obtain products with very low levels of metal impurities has brought a renewed focus on metal-free coupling processes. Here, we report the in situ preparation of reactive allylic and benzylic boronic acids, obtained by reacting flow-generated diazo compounds with boronic acids, and their application in controlled iterative C-C bond forming reactions is described. Thus far we have shown the formation of up to three C-C bonds in a sequence including the final trapping of a reactive boronic acid species with an aldehyde to generate a range of new chemical structures.

  20. Metal-Acetylacetonate Synthesis Experiments: Which Is Greener?

    ERIC Educational Resources Information Center

    Ribeiro, M. Gabriela T. C.; Machado, Adlio A. S. C.

    2011-01-01

    A procedure for teaching green chemistry through laboratory experiments is presented in which students are challenged to use the 12 principles of green chemistry to review and modify synthesis protocols to improve greenness. A global metric, green star, is used in parallel with green chemistry mass metrics to evaluate the improvement in greenness.…

  1. Peptide arrays on cellulose support: SPOT synthesis, a time and cost efficient method for synthesis of large numbers of peptides in a parallel and addressable fashion.

    PubMed

    Hilpert, Kai; Winkler, Dirk F H; Hancock, Robert E W

    2007-01-01

    Peptide synthesis on cellulose using SPOT technology allows the parallel synthesis of large numbers of addressable peptides in small amounts. In addition, the cost per peptide is less than 1% of peptides synthesized conventionally on resin. The SPOT method follows standard fluorenyl-methoxy-carbonyl chemistry on conventional cellulose sheets, and can utilize more than 600 different building blocks. The procedure involves three phases: preparation of the cellulose membrane, stepwise coupling of the amino acids and cleavage of the side-chain protection groups. If necessary, peptides can be cleaved from the membrane for assays performed using soluble peptides. These features make this method an excellent tool for screening large numbers of peptides for many different purposes. Potential applications range from simple binding assays, to more sophisticated enzyme assays and studies with living microbes or cells. The time required to complete the protocol depends on the number and length of the peptides. For example, 400 9-mer peptides can be synthesized within 6 days.

  2. String-averaging incremental subgradients for constrained convex optimization with applications to reconstruction of tomographic images

    NASA Astrophysics Data System (ADS)

    Massambone de Oliveira, Rafael; Salomão Helou, Elias; Fontoura Costa, Eduardo

    2016-11-01

    We present a method for non-smooth convex minimization which is based on subgradient directions and string-averaging techniques. In this approach, the set of available data is split into sequences (strings) and a given iterate is processed independently along each string, possibly in parallel, by an incremental subgradient method (ISM). The end-points of all strings are averaged to form the next iterate. The method is useful to solve sparse and large-scale non-smooth convex optimization problems, such as those arising in tomographic imaging. A convergence analysis is provided under realistic, standard conditions. Numerical tests are performed in a tomographic image reconstruction application, showing good performance for the convergence speed when measured as the decrease ratio of the objective function, in comparison to classical ISM.

  3. Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory

    NASA Astrophysics Data System (ADS)

    Jia, Weile; Lin, Lin

    2017-10-01

    Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.

  4. 3D unstructured-mesh radiation transport codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morel, J.

    1997-12-31

    Three unstructured-mesh radiation transport codes are currently being developed at Los Alamos National Laboratory. The first code is ATTILA, which uses an unstructured tetrahedral mesh in conjunction with standard Sn (discrete-ordinates) angular discretization, standard multigroup energy discretization, and linear-discontinuous spatial differencing. ATTILA solves the standard first-order form of the transport equation using source iteration in conjunction with diffusion-synthetic acceleration of the within-group source iterations. DANTE is designed to run primarily on workstations. The second code is DANTE, which uses a hybrid finite-element mesh consisting of arbitrary combinations of hexahedra, wedges, pyramids, and tetrahedra. DANTE solves several second-order self-adjoint forms of the transport equation including the even-parity equation, the odd-parity equation, and a new equation called the self-adjoint angular flux equation. DANTE also offers three angular discretization options:more » $$S{_}n$$ (discrete-ordinates), $$P{_}n$$ (spherical harmonics), and $$SP{_}n$$ (simplified spherical harmonics). DANTE is designed to run primarily on massively parallel message-passing machines, such as the ASCI-Blue machines at LANL and LLNL. The third code is PERICLES, which uses the same hybrid finite-element mesh as DANTE, but solves the standard first-order form of the transport equation rather than a second-order self-adjoint form. DANTE uses a standard $$S{_}n$$ discretization in angle in conjunction with trilinear-discontinuous spatial differencing, and diffusion-synthetic acceleration of the within-group source iterations. PERICLES was initially designed to run on workstations, but a version for massively parallel message-passing machines will be built. The three codes will be described in detail and computational results will be presented.« less

  5. Robust determination of the chemical potential in the pole expansion and selected inversion method for solving Kohn-Sham density functional theory.

    PubMed

    Jia, Weile; Lin, Lin

    2017-10-14

    Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper, we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.

  6. Regularization Parameter Selection for Nonlinear Iterative Image Restoration and MRI Reconstruction Using GCV and SURE-Based Methods

    PubMed Central

    Ramani, Sathish; Liu, Zhihao; Rosen, Jeffrey; Nielsen, Jon-Fredrik; Fessler, Jeffrey A.

    2012-01-01

    Regularized iterative reconstruction algorithms for imaging inverse problems require selection of appropriate regularization parameter values. We focus on the challenging problem of tuning regularization parameters for nonlinear algorithms for the case of additive (possibly complex) Gaussian noise. Generalized cross-validation (GCV) and (weighted) mean-squared error (MSE) approaches (based on Stein's Unbiased Risk Estimate— SURE) need the Jacobian matrix of the nonlinear reconstruction operator (representative of the iterative algorithm) with respect to the data. We derive the desired Jacobian matrix for two types of nonlinear iterative algorithms: a fast variant of the standard iterative reweighted least-squares method and the contemporary split-Bregman algorithm, both of which can accommodate a wide variety of analysis- and synthesis-type regularizers. The proposed approach iteratively computes two weighted SURE-type measures: Predicted-SURE and Projected-SURE (that require knowledge of noise variance σ2), and GCV (that does not need σ2) for these algorithms. We apply the methods to image restoration and to magnetic resonance image (MRI) reconstruction using total variation (TV) and an analysis-type ℓ1-regularization. We demonstrate through simulations and experiments with real data that minimizing Predicted-SURE and Projected-SURE consistently lead to near-MSE-optimal reconstructions. We also observed that minimizing GCV yields reconstruction results that are near-MSE-optimal for image restoration and slightly sub-optimal for MRI. Theoretical derivations in this work related to Jacobian matrix evaluations can be extended, in principle, to other types of regularizers and reconstruction algorithms. PMID:22531764

  7. A fresh look at electron cyclotron current drive power requirements for stabilization of tearing modes in ITER

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    La Haye, R. J., E-mail: lahaye@fusion.gat.com

    2015-12-10

    ITER is an international project to design and build an experimental fusion reactor based on the “tokamak” concept. ITER relies upon localized electron cyclotron current drive (ECCD) at the rational safety factor q=2 to suppress or stabilize the expected poloidal mode m=2, toroidal mode n=1 neoclassical tearing mode (NTM) islands. Such islands if unmitigated degrade energy confinement, lock to the resistive wall (stop rotating), cause loss of “H-mode” and induce disruption. The International Tokamak Physics Activity (ITPA) on MHD, Disruptions and Magnetic Control joint experiment group MDC-8 on Current Drive Prevention/Stabilization of Neoclassical Tearing Modes started in 2005, after whichmore » assessments were made for the requirements for ECCD needed in ITER, particularly that of rf power and alignment on q=2 [1]. Narrow well-aligned rf current parallel to and of order of one percent of the total plasma current is needed to replace the “missing” current in the island O-points and heal or preempt (avoid destabilization by applying ECCD on q=2 in absence of the mode) the island [2-4]. This paper updates the advances in ECCD stabilization on NTMs learned in DIII-D experiments and modeling during the last 5 to 10 years as applies to stabilization by localized ECCD of tearing modes in ITER. This includes the ECCD (inside the q=1 radius) stabilization of the NTM “seeding” instability known as sawteeth (m/n=1/1) [5]. Recent measurements in DIII-D show that the ITER-similar current profile is classically unstable, curvature stabilization must not be neglected, and the small island width stabilization effect from helical ion polarization currents is stronger than was previously thought [6]. The consequences of updated assumptions in ITER modeling of the minimum well-aligned ECCD power needed are all-in-all favorable (and well-within the ITER 24 gyrotron capability) when all effects are included. However, a “wild card” may be broadening of the localized ECCD by the presence of the island; various theories predict broadening could occur and there is experimental evidence for broadening in DIII-D. Wider than now expected ECCD in ITER would make alignment easier to do but weaken the stabilization and thus require more rf power. In addition to updated modeling for ITER, advances in the ITER-relevant DIII-D ECCD gyrotron launch mirror control system hardware and real-time plasma control system have been made [7] and there are plans for application in DIII-D ITER demonstration discharges.« less

  8. A fresh look at electron cyclotron current drive power requirements for stabilization of tearing modes in ITER

    NASA Astrophysics Data System (ADS)

    La Haye, R. J.

    2015-12-01

    ITER is an international project to design and build an experimental fusion reactor based on the "tokamak" concept. ITER relies upon localized electron cyclotron current drive (ECCD) at the rational safety factor q=2 to suppress or stabilize the expected poloidal mode m=2, toroidal mode n=1 neoclassical tearing mode (NTM) islands. Such islands if unmitigated degrade energy confinement, lock to the resistive wall (stop rotating), cause loss of "H-mode" and induce disruption. The International Tokamak Physics Activity (ITPA) on MHD, Disruptions and Magnetic Control joint experiment group MDC-8 on Current Drive Prevention/Stabilization of Neoclassical Tearing Modes started in 2005, after which assessments were made for the requirements for ECCD needed in ITER, particularly that of rf power and alignment on q=2 [1]. Narrow well-aligned rf current parallel to and of order of one percent of the total plasma current is needed to replace the "missing" current in the island O-points and heal or preempt (avoid destabilization by applying ECCD on q=2 in absence of the mode) the island [2-4]. This paper updates the advances in ECCD stabilization on NTMs learned in DIII-D experiments and modeling during the last 5 to 10 years as applies to stabilization by localized ECCD of tearing modes in ITER. This includes the ECCD (inside the q=1 radius) stabilization of the NTM "seeding" instability known as sawteeth (m/n=1/1) [5]. Recent measurements in DIII-D show that the ITER-similar current profile is classically unstable, curvature stabilization must not be neglected, and the small island width stabilization effect from helical ion polarization currents is stronger than was previously thought [6]. The consequences of updated assumptions in ITER modeling of the minimum well-aligned ECCD power needed are all-in-all favorable (and well-within the ITER 24 gyrotron capability) when all effects are included. However, a "wild card" may be broadening of the localized ECCD by the presence of the island; various theories predict broadening could occur and there is experimental evidence for broadening in DIII-D. Wider than now expected ECCD in ITER would make alignment easier to do but weaken the stabilization and thus require more rf power. In addition to updated modeling for ITER, advances in the ITER-relevant DIII-D ECCD gyrotron launch mirror control system hardware and real-time plasma control system have been made [7] and there are plans for application in DIII-D ITER demonstration discharges.

  9. Fault Accommodation in Control of Flexible Systems

    NASA Technical Reports Server (NTRS)

    Maghami, Peiman G.; Sparks, Dean W., Jr.; Lim, Kyong B.

    1998-01-01

    New synthesis techniques for the design of fault accommodating controllers for flexible systems are developed. Three robust control design strategies, static dissipative, dynamic dissipative and mu-synthesis, are used in the approach. The approach provides techniques for designing controllers that maximize, in some sense, the tolerance of the closed-loop system against faults in actuators and sensors, while guaranteeing performance robustness at a specified performance level, measured in terms of the proximity of the closed-loop poles to the imaginary axis (the degree of stability). For dissipative control designs, nonlinear programming is employed to synthesize the controllers, whereas in mu-synthesis, the traditional D-K iteration is used. To demonstrate the feasibility of the proposed techniques, they are applied to the control design of a structural model of a flexible laboratory test structure.

  10. Multilevel decomposition of complete vehicle configuration in a parallel computing environment

    NASA Technical Reports Server (NTRS)

    Bhatt, Vinay; Ragsdell, K. M.

    1989-01-01

    This research summarizes various approaches to multilevel decomposition to solve large structural problems. A linear decomposition scheme based on the Sobieski algorithm is selected as a vehicle for automated synthesis of a complete vehicle configuration in a parallel processing environment. The research is in a developmental state. Preliminary numerical results are presented for several example problems.

  11. PISCES: An environment for parallel scientific computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.

  12. Automated synthesis of a 96 product-sized library of triazole derivatives using a solid phase supported copper catalyst.

    PubMed

    Jlalia, Ibtissem; Beauvineau, Claire; Beauvière, Sophie; Onen, Esra; Aufort, Marie; Beauvineau, Aymeric; Khaba, Eihab; Herscovici, Jean; Meganem, Faouzi; Girard, Christian

    2010-04-28

    This article deal with the parallel synthesis of a 96 product-sized library using a polymer-based copper catalyst that we developed which can be easily separated from the products by simple filtration. This gave us the opportunity to use this catalyst in an automated chemical synthesis station (Chemspeed ASW-2000). Studies and results about the preparation of the catalyst, its use in different solvent systems, its recycling capabilities and its scope and limitations in the synthesis of this library will be addressed. The synthesis of the triazole library and the very good results obtained will finally be discussed.

  13. THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

    NASA Astrophysics Data System (ADS)

    Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

    2015-07-01

    The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.

  14. Scaling Optimization of the SIESTA MHD Code

    NASA Astrophysics Data System (ADS)

    Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

    2013-10-01

    SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.

  15. Symmetry Breaking by Parallel Flow Shear

    NASA Astrophysics Data System (ADS)

    Li, Jiacong; Diamond, Patrick

    2015-11-01

    Plasma rotation is important in reducing turbulent transport, suppressing MHD instabilities, and is beneficial to confinement. Intrinsic rotation without an external momentum input is of interest for its plausible application on ITER. k∥ spectrum asymmetry is required for residual Reynolds stress that drives the intrinsic rotation. Parallel flows are reported in linear devices without magnetic shear. In CSDX, parallel flows are mostly peaked in the core [Thakur et al., 2014]; more robust flows and reversed profiles are seen in PANTA [Oldenburger, et al. 2012]. A novel mechanism for symmetry breaking in momentum transport is proposed. Magnetic shear or mean flow profile are not required. A seed parallel flow shear (PFS) sets the sign of residual stress by selecting certain modes to grow faster. The resulted spectrum imbalance leads to a nonzero residual stress, which further drives a parallel flow with ∇n as the free energy source, adding to the shear until saturated by diffusion. Balanced flow gradient is set by Π∥Res /χϕ . Residual stress is calculated for ITG turbulence and collisional drift wave turbulence where electron-ion and electron-neutral collisions are discussed and compared. Numerical simulation is proposed for testing the effect of PFS.

  16. A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

    NASA Astrophysics Data System (ADS)

    Ma, Sangback

    In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.

  17. Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

    NASA Astrophysics Data System (ADS)

    Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

    2014-06-01

    We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.

  18. Combining living anionic polymerization with branching reactions in an iterative fashion to design branched polymers.

    PubMed

    Higashihara, Tomoya; Sugiyama, Kenji; Yoo, Hee-Soo; Hayashi, Mayumi; Hirao, Akira

    2010-06-16

    This paper reviews the precise synthesis of many-armed and multi-compositional star-branched polymers, exact graft (co)polymers, and structurally well-defined dendrimer-like star-branched polymers, which are synthetically difficult, by a commonly-featured iterative methodology combining living anionic polymerization with branched reactions to design branched polymers. The methodology basically involves only two synthetic steps; (a) preparation of a polymeric building block corresponding to each branched polymer and (b) connection of the resulting building unit to another unit. The synthetic steps were repeated in a stepwise fashion several times to successively synthesize a series of well-defined target branched polymers. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. VINE: A Variational Inference -Based Bayesian Neural Network Engine

    DTIC Science & Technology

    2018-01-01

    networks are trained using the same dataset and hyper parameter settings as discussed. Table 1 Performance evaluation of the proposed transfer learning...multiplication/addition/subtraction. These operations can be implemented using nested loops in which various iterations of a loop are independent of...each other. This introduces an opportunity for optimization where a loop may be unrolled fully or partially to increase parallelism at the cost of

  20. Parallel Worlds: Agile and Waterfall Differences and Similarities

    DTIC Science & Technology

    2013-10-01

    development model , and it is deliberately shorter than the Agile Overview as most readers are assumed to be from the Traditional World. For a more in...process of DODI 5000 does not forbid the iterative incremental software development model with frequent end-user interaction, it requires heroics on...added). Today, many of the DOD’s large IT programs therefore continue to adopt program structures and software development models closely

  1. Terascale Optimal PDE Simulations (TOPS) Center

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Professor Olof B. Widlund

    2007-07-09

    Our work has focused on the development and analysis of domain decomposition algorithms for a variety of problems arising in continuum mechanics modeling. In particular, we have extended and analyzed FETI-DP and BDDC algorithms; these iterative solvers were first introduced and studied by Charbel Farhat and his collaborators, see [11, 45, 12], and by Clark Dohrmann of SANDIA, Albuquerque, see [43, 2, 1], respectively. These two closely related families of methods are of particular interest since they are used more extensively than other iterative substructuring methods to solve very large and difficult problems. Thus, the FETI algorithms are part ofmore » the SALINAS system developed by the SANDIA National Laboratories for very large scale computations, and as already noted, BDDC was first developed by a SANDIA scientist, Dr. Clark Dohrmann. The FETI algorithms are also making inroads in commercial engineering software systems. We also note that the analysis of these algorithms poses very real mathematical challenges. The success in developing this theory has, in several instances, led to significant improvements in the performance of these algorithms. A very desirable feature of these iterative substructuring and other domain decomposition algorithms is that they respect the memory hierarchy of modern parallel and distributed computing systems, which is essential for approaching peak floating point performance. The development of improved methods, together with more powerful computer systems, is making it possible to carry out simulations in three dimensions, with quite high resolution, relatively easily. This work is supported by high quality software systems, such as Argonne's PETSc library, which facilitates code development as well as the access to a variety of parallel and distributed computer systems. The success in finding scalable and robust domain decomposition algorithms for very large number of processors and very large finite element problems is, e.g., illustrated in [24, 25, 26]. This work is based on [29, 31]. Our work over these five and half years has, in our opinion, helped advance the knowledge of domain decomposition methods significantly. We see these methods as providing valuable alternatives to other iterative methods, in particular, those based on multi-grid. In our opinion, our accomplishments also match the goals of the TOPS project quite closely.« less

  2. LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor

    NASA Astrophysics Data System (ADS)

    Moberly, Raymond; O'Sullivan, Michael; Waheed, Khurram

    2007-09-01

    Implementing the sum-product algorithm, in an FPGA with an embedded processor, invites us to consider a tradeoff between computational precision and computational speed. The algorithm, known outside of the signal processing community as Pearl's belief propagation, is used for iterative soft-decision decoding of LDPC codes. We determined the feasibility of a coprocessor that will perform product computations. Our FPGA-based coprocessor (design) performs computer algebra with significantly less precision than the standard (e.g. integer, floating-point) operations of general purpose processors. Using synthesis, targeting a 3,168 LUT Xilinx FPGA, we show that key components of a decoder are feasible and that the full single-precision decoder could be constructed using a larger part. Soft-decision decoding by the iterative belief propagation algorithm is impacted both positively and negatively by a reduction in the precision of the computation. Reducing precision reduces the coding gain, but the limited-precision computation can operate faster. A proposed solution offers custom logic to perform computations with less precision, yet uses the floating-point format to interface with the software. Simulation results show the achievable coding gain. Synthesis results help theorize the the full capacity and performance of an FPGA-based coprocessor.

  3. Synthesis of Aryl-Substituted 2,4-Dinitrophenylamines: Nucleophilic Aromatic Substitution as a Problem-Solving and Collaborative-Learning Approach

    ERIC Educational Resources Information Center

    Santos, Elvira Santos; Garcia, Irma Cruz Gavilan; Gomez, Eva Florencia Lejarazo; Vilchis-Reyes, Miguel Angel

    2010-01-01

    A series of experiments based on problem-solving and collaborative-learning pedagogies are described that encourage students to interpret results and draw conclusions from data. Different approaches including parallel library synthesis, solvent variation, and leaving group variation are used to study a nucleophilic aromatic substitution of…

  4. Prospects for Advanced Tokamak Operation of ITER

    NASA Astrophysics Data System (ADS)

    Neilson, George H.

    1996-11-01

    Previous studies have identified steady-state (or "advanced") modes for ITER, based on reverse-shear profiles and significant bootstrap current. A typical example has 12 MA of plasma current, 1,500 MW of fusion power, and 100 MW of heating and current-drive power. The implementation of these and other steady-state operating scenarios in the ITER device is examined in order to identify key design modifications that can enhance the prospects for successfully achieving advanced tokamak operating modes in ITER compatible with a single null divertor design. In particular, we examine plasma configurations that can be achieved by the ITER poloidal field system with either a monolithic central solenoid (as in the ITER Interim Design), or an alternate "hybrid" central solenoid design which provides for greater flexibility in the plasma shape. The increased control capability and expanded operating space provided by the hybrid central solenoid allows operation at high triangularity (beneficial for improving divertor performance through control of edge-localized modes and for increasing beta limits), and will make it much easier for ITER operators to establish an optimum startup trajectory leading to a high-performance, steady-state scenario. Vertical position control is examined because plasmas made accessible by the hybrid central solenoid can be more elongated and/or less well coupled to the conducting structure. Control of vertical-displacements using the external PF coils remains feasible over much of the expanded operating space. Further work is required to define the full spectrum of axisymmetric plasma disturbances requiring active control In addition to active axisymmetric control, advanced tokamak modes in ITER may require active control of kink modes on the resistive time scale of the conducting structure. This might be accomplished in ITER through the use of active control coils external to the vacuum vessel which are actuated by magnetic sensors near the first wall. The enhanced shaping and positioning flexibility provides a range of options for reducing the ripple-induced losses of fast alpha particles--a major limitation on ITER steady-state modes. An alternate approach that we are pursuing in parallel is the inclusion of ferromagnetic inserts to reduce the toroidal field ripple within the plasma chamber. The inclusion of modest design changes such as the hybrid central solenoid, active control coils for kink modes, and ferromagnetic inserts for TF ripple reduction show can greatly increase the flexibility to accommodate advance tokamak operation in ITER. Increased flexibility is important because the optimum operating scenario for ITER cannot be predicted with certainty. While low-inductance, reverse shear modes appear attractive for steady-state operation, high-inductance, high-beta modes are also viable candidates, and it is important that ITER have the flexibility to explore both these, and other, operating regimes.

  5. Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Xujun; Li, Jiyuan; Jiang, Xikai

    An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less

  6. Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

    NASA Astrophysics Data System (ADS)

    Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

    2016-05-01

    This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.

  7. Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

    DOE PAGES

    Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...

    2017-06-29

    An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less

  8. Performance of a parallel thermal-hydraulics code TEMPEST

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fann, G.I.; Trent, D.S.

    The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scalingmore » performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.« less

  9. Computational method and system for modeling, analyzing, and optimizing DNA amplification and synthesis

    DOEpatents

    Vandersall, Jennifer A.; Gardner, Shea N.; Clague, David S.

    2010-05-04

    A computational method and computer-based system of modeling DNA synthesis for the design and interpretation of PCR amplification, parallel DNA synthesis, and microarray chip analysis. The method and system include modules that address the bioinformatics, kinetics, and thermodynamics of DNA amplification and synthesis. Specifically, the steps of DNA selection, as well as the kinetics and thermodynamics of DNA hybridization and extensions, are addressed, which enable the optimization of the processing and the prediction of the products as a function of DNA sequence, mixing protocol, time, temperature and concentration of species.

  10. SIMULATION AND VISUALIZATION OF FLOW PATTERN IN MICROARRAYS FOR LIQUID PHASE OLIGONUCLEOTIDE AND PEPTIDE SYNTHESIS

    PubMed Central

    O-Charoen, Sirimon; Srivannavit, Onnop; Gulari, Erdogan

    2008-01-01

    Microfluidic microarrays have been developed for economical and rapid parallel synthesis of oligonucleotide and peptide libraries. For a synthesis system to be reproducible and uniform, it is crucial to have a uniform reagent delivery throughout the system. Computational fluid dynamics (CFD) is used to model and simulate the microfluidic microarrays to study geometrical effects on flow patterns. By proper design geometry, flow uniformity could be obtained in every microreactor in the microarrays. PMID:17480053

  11. Parallel synthesis of ureas and carbamates from amines and CO2 under mild conditions.

    PubMed

    Peterson, Scott L; Stucka, Sabrina M; Dinsmore, Christopher J

    2010-03-19

    A mild and efficient library synthesis technique has been developed for the synthesis of ureas and carbamates from carbamic acids derived from the DBU-catalyzed reaction of amines and gaseous carbon dioxide. Carbamic acids derived from primary amines reacted with Mitsunobu reagents to generate isocyanates in situ which were condensed with primary and secondary amines to afford the desired ureas. Similarly, carbamic acids from secondary amines reacted with alcohols activated with Mitsunobu reagents to form carbamates.

  12. A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

    NASA Technical Reports Server (NTRS)

    Bailey, David H.; Ferguson, Helaman R. P.

    1988-01-01

    Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spotz, William F.

    PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.

    We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less

  15. THERMAL DESIGN OF THE ITER VACUUM VESSEL COOLING SYSTEM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carbajo, Juan J; Yoder Jr, Graydon L; Kim, Seokho H

    RELAP5-3D models of the ITER Vacuum Vessel (VV) Primary Heat Transfer System (PHTS) have been developed. The design of the cooling system is described in detail, and RELAP5 results are presented. Two parallel pump/heat exchanger trains comprise the design one train is for full-power operation and the other is for emergency operation or operation at decay heat levels. All the components are located inside the Tokamak building (a significant change from the original configurations). The results presented include operation at full power, decay heat operation, and baking operation. The RELAP5-3D results confirm that the design can operate satisfactorily during bothmore » normal pulsed power operation and decay heat operation. All the temperatures in the coolant and in the different system components are maintained within acceptable operating limits.« less

  16. Hybrid cloud and cluster computing paradigms for life science applications

    PubMed Central

    2010-01-01

    Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982

  17. Hybrid cloud and cluster computing paradigms for life science applications.

    PubMed

    Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

    2010-12-21

    Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

  18. Conceptual study of the cryocascade for pumping, separation and recycling of ITER torus exhaust

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mack, A.; Perinic, D.

    1994-12-31

    For pumping, separation and recycling of the ITER plasma exhaust, a pumping system working reliably under ambient and operating conditions of the ITER reactor is required. The pump is exposed to a magnetic field of about 3 T and has to be resistant to radioactive irradiation of 10{sup 9} rad. In the burn and dwell phase, a gas mixture consisting of hydrogen isotopes, helium ash and impurities has to be pumped at the pressure range of 10{sup {minus}2} - 10{sup {minus}1} mbar. Within the framework of the European Fusion Technology Programme, the concept of a primary cryopump for use inmore » ITER is being prepared at KfK. The cryocascade concept is planned to include three pump stages. These pump stages, which are connected in series, consist of individual chambers that may be separated from each other by means of cold valves. In the first stage, the impurities of the reactor exhaust gas are frozen out at 20-30 K. Settling of the hydrogen isotopes H/D/T on the 5 K cryosurfaces takes place in the second stage. This stage is made up of two parallel chambers, which can be switched from the pumping to the regeneration mode or vice versa. The helium fraction is bound in the downstream 5 K adsorption stage.« less

  19. Increasing feasibility of the field-programmable gate array implementation of an iterative image registration using a kernel-warping algorithm

    NASA Astrophysics Data System (ADS)

    Nguyen, An Hung; Guillemette, Thomas; Lambert, Andrew J.; Pickering, Mark R.; Garratt, Matthew A.

    2017-09-01

    Image registration is a fundamental image processing technique. It is used to spatially align two or more images that have been captured at different times, from different sensors, or from different viewpoints. There have been many algorithms proposed for this task. The most common of these being the well-known Lucas-Kanade (LK) and Horn-Schunck approaches. However, the main limitation of these approaches is the computational complexity required to implement the large number of iterations necessary for successful alignment of the images. Previously, a multi-pass image interpolation algorithm (MP-I2A) was developed to considerably reduce the number of iterations required for successful registration compared with the LK algorithm. This paper develops a kernel-warping algorithm (KWA), a modified version of the MP-I2A, which requires fewer iterations to successfully register two images and less memory space for the field-programmable gate array (FPGA) implementation than the MP-I2A. These reductions increase feasibility of the implementation of the proposed algorithm on FPGAs with very limited memory space and other hardware resources. A two-FPGA system rather than single FPGA system is successfully developed to implement the KWA in order to compensate insufficiency of hardware resources supported by one FPGA, and increase parallel processing ability and scalability of the system.

  20. Multiple Representations-Based Face Sketch-Photo Synthesis.

    PubMed

    Peng, Chunlei; Gao, Xinbo; Wang, Nannan; Tao, Dacheng; Li, Xuelong; Li, Jie

    2016-11-01

    Face sketch-photo synthesis plays an important role in law enforcement and digital entertainment. Most of the existing methods only use pixel intensities as the feature. Since face images can be described using features from multiple aspects, this paper presents a novel multiple representations-based face sketch-photo-synthesis method that adaptively combines multiple representations to represent an image patch. In particular, it combines multiple features from face images processed using multiple filters and deploys Markov networks to exploit the interacting relationships between the neighboring image patches. The proposed framework could be solved using an alternating optimization strategy and it normally converges in only five outer iterations in the experiments. Our experimental results on the Chinese University of Hong Kong (CUHK) face sketch database, celebrity photos, CUHK Face Sketch FERET Database, IIIT-D Viewed Sketch Database, and forensic sketches demonstrate the effectiveness of our method for face sketch-photo synthesis. In addition, cross-database and database-dependent style-synthesis evaluations demonstrate the generalizability of this novel method and suggest promising solutions for face identification in forensic science.

  1. Solution- and solid-phase parallel synthesis of 4-alkoxy-substituted pyrimidines with high molecular diversity.

    PubMed

    Font, David; Heras, Montserrat; Villalgordo, José M

    2003-01-01

    A simple and straightforward methodology toward the synthesis of novel 2,6-disubstituted-4-alkoxypyrimidine derivatives of type 16 and 19 has been developed. This methodology, initially developed in solution, can be perfectly adapted to the solid support under analogous conditions, taking full advantage of automated parallel synthesis systems. This successful methodology benefits from the key role played by the thioether linkage placed at the 2-position in 3, 9, or 13 in a double manner: on one side, the steric effect exerted by the thioether linkage is likely to be responsible for the very high observed selectivity toward the formation of the O-alkylation products. On the other side, this sulfur linkage can serve not only as a robust point of attachment for the heterocycle, stable to a number of reaction conditions, but also as a means of introducing a new element of diversity through activation to the corresponding sulfone (safety-catch linker concept) and subsequent ipso-substitution reaction with a variety of different N-nucleophiles.

  2. RPython high-level synthesis

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radoslaw; Linczuk, Maciej

    2016-09-01

    The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.

  3. Fast simulation techniques for switching converters

    NASA Technical Reports Server (NTRS)

    King, Roger J.

    1987-01-01

    Techniques for simulating a switching converter are examined. The state equations for the equivalent circuits, which represent the switching converter, are presented and explained. The uses of the Newton-Raphson iteration, low ripple approximation, half-cycle symmetry, and discrete time equations to compute the interval durations are described. An example is presented in which these methods are illustrated by applying them to a parallel-loaded resonant inverter with three equivalent circuits for its continuous mode of operation.

  4. Approximation algorithms for scheduling unrelated parallel machines with release dates

    NASA Astrophysics Data System (ADS)

    Avdeenko, T. V.; Mesentsev, Y. A.; Estraykh, I. V.

    2017-01-01

    In this paper we propose approaches to optimal scheduling of unrelated parallel machines with release dates. One approach is based on the scheme of dynamic programming modified with adaptive narrowing of search domain ensuring its computational effectiveness. We discussed complexity of the exact schedules synthesis and compared it with approximate, close to optimal, solutions. Also we explain how the algorithm works for the example of two unrelated parallel machines and five jobs with release dates. Performance results that show the efficiency of the proposed approach have been given.

  5. Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    2003-01-01

    A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic region growing.

  6. Efficient Synthesis of γ-Lactams by a Tandem Reductive Amination/Lactamization Sequence

    PubMed Central

    Nöth, Julica; Frankowski, Kevin J.; Neuenswander, Benjamin; Aubé, Jeffrey; Reiser, Oliver

    2009-01-01

    A three-component method for synthesizing highly-substituted γ-lactams from readily available maleimides, aldehydes and amines is described. A new reductive amination/intramolecular lactamization sequence provides a straightforward route to the lactam products in a single manipulation. The general utility of this method is demonstrated by the parallel synthesis of a γ-lactam library. PMID:18338857

  7. A COMPARISON OF EXPERIMENTS AND THREE-DIMENSIONAL ANALYSIS TECHNIQUES. PART II. UNPOISONED UNIFORM SLAB CORE WITH A PARTIALLY INSERTED HAFNIUM ROD AND A PARTIALLY INSERTED WATER GAP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roseberry, R.J.

    The experimental measurements and nuclear analysis of a uniformly loaded, unpoisoned slab core with a partially inserted hafnium rod and/or a partially inserted water gap are described. Comparisons of experimental data with calculated results of the UFO core and flux synthesis techniques are given. It is concluded that one of the flux synthesis techniques and the UFO code are able to predict flux distributions to within approximately -5% of experiment for most cases, with a maximum error of approximately -10% for a channel at the core- reflector boundary. The second synthesis technique failed to give comparable agreement with experiment evenmore » when various refinements were used, e.g. increasing the number of mesh points, performing the flux synthesis technique of iteration, and spectrum-weighting the appropriate calculated fluxes through the use of the SWAKRAUM code. These results are comparable to those reported in Part I of this study. (auth)« less

  8. Spectrum synthesis for a spectrally tunable light source based on a DMD-convex grating Offner configuration

    NASA Astrophysics Data System (ADS)

    Ma, Suodong; Pan, Qiao; Shen, Weimin

    2016-09-01

    As one kind of light source simulation devices, spectrally tunable light sources are able to generate specific spectral shape and radiant intensity outputs according to different application requirements, which have urgent demands in many fields of the national economy and the national defense industry. Compared with the LED-type spectrally tunable light source, the one based on a DMD-convex grating Offner configuration has advantages of high spectral resolution, strong digital controllability, high spectrum synthesis accuracy, etc. As a key link of the above type light source to achieve target spectrum outputs, spectrum synthesis algorithm based on spectrum matching is therefore very important. An improved spectrum synthesis algorithm based on linear least square initialization and Levenberg-Marquardt iterative optimization is proposed in this paper on the basis of in-depth study of the spectrum matching principle. The effectiveness of the proposed method is verified by a series of simulations and experimental works.

  9. A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

    1992-01-01

    Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.

  10. Modal analysis of a nonuniform string with end mass and variable tension

    NASA Technical Reports Server (NTRS)

    Rheinfurth, M. H.; Galaboff, Z. J.

    1983-01-01

    Modal synthesis techniques for dynamic systems containing strings describe the lateral displacements of these strings by properly chosen shape functions. An iterative algorithm is provided to calculate the natural modes of a nonuniform string and variable tension for some typical boundary conditions including one end mass. Numerical examples are given for a string in a constant and a gravity gradient force field.

  11. Low-authority control synthesis for large space structures

    NASA Technical Reports Server (NTRS)

    Aubrun, J. N.; Margulies, G.

    1982-01-01

    The control of vibrations of large space structures by distributed sensors and actuators is studied. A procedure is developed for calculating the feedback loop gains required to achieve specified amounts of damping. For moderate damping (Low Authority Control) the procedure is purely algebraic, but it can be applied iteratively when larger amounts of damping are required and is generalized for arbitrary time invariant systems.

  12. Modal Test/Analysis Correlation of Space Station Structures Using Nonlinear Sensitivity

    NASA Technical Reports Server (NTRS)

    Gupta, Viney K.; Newell, James F.; Berke, Laszlo; Armand, Sasan

    1992-01-01

    The modal correlation problem is formulated as a constrained optimization problem for validation of finite element models (FEM's). For large-scale structural applications, a pragmatic procedure for substructuring, model verification, and system integration is described to achieve effective modal correlation. The space station substructure FEM's are reduced using Lanczos vectors and integrated into a system FEM using Craig-Bampton component modal synthesis. The optimization code is interfaced with MSC/NASTRAN to solve the problem of modal test/analysis correlation; that is, the problem of validating FEM's for launch and on-orbit coupled loads analysis against experimentally observed frequencies and mode shapes. An iterative perturbation algorithm is derived and implemented to update nonlinear sensitivity (derivatives of eigenvalues and eigenvectors) during optimizer iterations, which reduced the number of finite element analyses.

  13. Modal test/analysis correlation of Space Station structures using nonlinear sensitivity

    NASA Technical Reports Server (NTRS)

    Gupta, Viney K.; Newell, James F.; Berke, Laszlo; Armand, Sasan

    1992-01-01

    The modal correlation problem is formulated as a constrained optimization problem for validation of finite element models (FEM's). For large-scale structural applications, a pragmatic procedure for substructuring, model verification, and system integration is described to achieve effective modal correlations. The space station substructure FEM's are reduced using Lanczos vectors and integrated into a system FEM using Craig-Bampton component modal synthesis. The optimization code is interfaced with MSC/NASTRAN to solve the problem of modal test/analysis correlation; that is, the problem of validating FEM's for launch and on-orbit coupled loads analysis against experimentally observed frequencies and mode shapes. An iterative perturbation algorithm is derived and implemented to update nonlinear sensitivity (derivatives of eigenvalues and eigenvectors) during optimizer iterations, which reduced the number of finite element analyses.

  14. Robust distributed model predictive control of linear systems with structured time-varying uncertainties

    NASA Astrophysics Data System (ADS)

    Zhang, Langwen; Xie, Wei; Wang, Jingcheng

    2017-11-01

    In this work, synthesis of robust distributed model predictive control (MPC) is presented for a class of linear systems subject to structured time-varying uncertainties. By decomposing a global system into smaller dimensional subsystems, a set of distributed MPC controllers, instead of a centralised controller, are designed. To ensure the robust stability of the closed-loop system with respect to model uncertainties, distributed state feedback laws are obtained by solving a min-max optimisation problem. The design of robust distributed MPC is then transformed into solving a minimisation optimisation problem with linear matrix inequality constraints. An iterative online algorithm with adjustable maximum iteration is proposed to coordinate the distributed controllers to achieve a global performance. The simulation results show the effectiveness of the proposed robust distributed MPC algorithm.

  15. An asymptotic-preserving Lagrangian algorithm for the time-dependent anisotropic heat transport equation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.

    2014-09-01

    We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less

  16. Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

    NASA Astrophysics Data System (ADS)

    Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

    2015-05-01

    3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.

  17. Parallel ptychographic reconstruction

    DOE PAGES

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less

  18. Gyrokinetic theory of turbulent acceleration and momentum conservation in tokamak plasmas

    NASA Astrophysics Data System (ADS)

    Lu, WANG; Shuitao, PENG; P, H. DIAMOND

    2018-07-01

    Understanding the generation of intrinsic rotation in tokamak plasmas is crucial for future fusion reactors such as ITER. We proposed a new mechanism named turbulent acceleration for the origin of the intrinsic parallel rotation based on gyrokinetic theory. The turbulent acceleration acts as a local source or sink of parallel rotation, i.e., volume force, which is different from the divergence of residual stress, i.e., surface force. However, the order of magnitude of turbulent acceleration can be comparable to that of the divergence of residual stress for electrostatic ion temperature gradient (ITG) turbulence. A possible theoretical explanation for the experimental observation of electron cyclotron heating induced decrease of co-current rotation was also proposed via comparison between the turbulent acceleration driven by ITG turbulence and that driven by collisionless trapped electron mode turbulence. We also extended this theory to electromagnetic ITG turbulence and investigated the electromagnetic effects on intrinsic parallel rotation drive. Finally, we demonstrated that the presence of turbulent acceleration does not conflict with momentum conservation.

  19. Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

    PubMed

    Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

    2015-07-05

    We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  20. Design and performance evaluation of a 20-aperture multipinhole collimator for myocardial perfusion imaging applications.

    PubMed

    Bowen, Jason D; Huang, Qiu; Ellin, Justin R; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

    2013-10-21

    Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.

  1. Design and performance evaluation of a 20-aperture multipinhole collimator for myocardial perfusion imaging applications

    NASA Astrophysics Data System (ADS)

    Bowen, Jason D.; Huang, Qiu; Ellin, Justin R.; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

    2013-10-01

    Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.

  2. Extreme Carrier Depletion and Superlinear Photoconductivity in Ultrathin Parallel-Aligned ZnO Nanowire Array Photodetectors Fabricated by Infiltration Synthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nam, Chang-Yong; Stein, Aaron

    Ultrathin semiconductor nanowires enable high-performance chemical sensors and photodetectors, but their synthesis and device integration by standard complementary metal-oxide-semiconductor (CMOS)-compatible processes remain persistent challenges. This work demonstrates fully CMOS-compatible synthesis and integration of parallel-aligned polycrystalline ZnO nanowire arrays into ultraviolet photodetectors via infiltration synthesis, material hybridization technique derived from atomic layer deposition. The nanowire photodetector features unique, high device performances originating from extreme charge carrier depletion, achieving photoconductive on–off ratios of >6 decades, blindness to visible light, and ultralow dark currents as low as 1 fA, the lowest reported for nanostructure-based photoconductive photodetectors. Surprisingly, the low dark current is invariantmore » with increasing number of nanowires and the photodetector shows unusual superlinear photoconductivity, observed for the first time in nanowires, leading to increasing detector responsivity and other parameters for higher incident light powers. Temperature-dependent carrier concentration and mobility reveal the photoelectrochemical-thermionic emission process at grain boundaries, responsible for the observed unique photodetector performances and superlinear photoconductivity. Here, the results elucidate fundamental processes responsible for photogain in polycrystalline nanostructures, providing useful guidelines for developing nanostructure-based detectors and sensors. Lastly, the developed fully CMOS-compatible nanowire synthesis and device fabrication methods also have potentials for scalable integration of nanowire sensor devices and circuitries.« less

  3. Extreme Carrier Depletion and Superlinear Photoconductivity in Ultrathin Parallel-Aligned ZnO Nanowire Array Photodetectors Fabricated by Infiltration Synthesis

    DOE PAGES

    Nam, Chang-Yong; Stein, Aaron

    2017-11-15

    Ultrathin semiconductor nanowires enable high-performance chemical sensors and photodetectors, but their synthesis and device integration by standard complementary metal-oxide-semiconductor (CMOS)-compatible processes remain persistent challenges. This work demonstrates fully CMOS-compatible synthesis and integration of parallel-aligned polycrystalline ZnO nanowire arrays into ultraviolet photodetectors via infiltration synthesis, material hybridization technique derived from atomic layer deposition. The nanowire photodetector features unique, high device performances originating from extreme charge carrier depletion, achieving photoconductive on–off ratios of >6 decades, blindness to visible light, and ultralow dark currents as low as 1 fA, the lowest reported for nanostructure-based photoconductive photodetectors. Surprisingly, the low dark current is invariantmore » with increasing number of nanowires and the photodetector shows unusual superlinear photoconductivity, observed for the first time in nanowires, leading to increasing detector responsivity and other parameters for higher incident light powers. Temperature-dependent carrier concentration and mobility reveal the photoelectrochemical-thermionic emission process at grain boundaries, responsible for the observed unique photodetector performances and superlinear photoconductivity. Here, the results elucidate fundamental processes responsible for photogain in polycrystalline nanostructures, providing useful guidelines for developing nanostructure-based detectors and sensors. Lastly, the developed fully CMOS-compatible nanowire synthesis and device fabrication methods also have potentials for scalable integration of nanowire sensor devices and circuitries.« less

  4. Development of structural schemes of parallel structure manipulators using screw calculus

    NASA Astrophysics Data System (ADS)

    Rashoyan, G. V.; Shalyukhin, K. A.; Gaponenko, EV

    2018-03-01

    The paper considers the approach to the structural analysis and synthesis of parallel structure robots based on the mathematical apparatus of groups of screws and on a concept of reciprocity of screws. The results are depicted of synthesis of parallel structure robots with different numbers of degrees of freedom, corresponding to the different groups of screws. Power screws are applied with this aim, based on the principle of static-kinematic analogy; the power screws are similar to the orts of axes of not driven kinematic pairs of a corresponding connecting chain. Accordingly, kinematic screws of the outlet chain of a robot are simultaneously determined which are reciprocal to power screws of kinematic sub-chains. Solution of certain synthesis problems is illustrated with practical applications. Closed groups of screws can have eight types. The three-membered groups of screws are of greatest significance, as well as four-membered screw groups [1] and six-membered screw groups. Three-membered screw groups correspond to progressively guiding mechanisms, to spherical mechanisms, and to planar mechanisms. The four-membered group corresponds to the motion of the SCARA robot. The six-membered group includes all possible motions. From the works of A.P. Kotelnikov, F.M. Dimentberg, it is known that closed fifth-order screw groups do not exist. The article presents examples of the mechanisms corresponding to the given groups.

  5. Digital tomosynthesis mammography using a parallel maximum-likelihood reconstruction method

    NASA Astrophysics Data System (ADS)

    Wu, Tao; Zhang, Juemin; Moore, Richard; Rafferty, Elizabeth; Kopans, Daniel; Meleis, Waleed; Kaeli, David

    2004-05-01

    A parallel reconstruction method, based on an iterative maximum likelihood (ML) algorithm, is developed to provide fast reconstruction for digital tomosynthesis mammography. Tomosynthesis mammography acquires 11 low-dose projections of a breast by moving an x-ray tube over a 50° angular range. In parallel reconstruction, each projection is divided into multiple segments along the chest-to-nipple direction. Using the 11 projections, segments located at the same distance from the chest wall are combined to compute a partial reconstruction of the total breast volume. The shape of the partial reconstruction forms a thin slab, angled toward the x-ray source at a projection angle 0°. The reconstruction of the total breast volume is obtained by merging the partial reconstructions. The overlap region between neighboring partial reconstructions and neighboring projection segments is utilized to compensate for the incomplete data at the boundary locations present in the partial reconstructions. A serial execution of the reconstruction is compared to a parallel implementation, using clinical data. The serial code was run on a PC with a single PentiumIV 2.2GHz CPU. The parallel implementation was developed using MPI and run on a 64-node Linux cluster using 800MHz Itanium CPUs. The serial reconstruction for a medium-sized breast (5cm thickness, 11cm chest-to-nipple distance) takes 115 minutes, while a parallel implementation takes only 3.5 minutes. The reconstruction time for a larger breast using a serial implementation takes 187 minutes, while a parallel implementation takes 6.5 minutes. No significant differences were observed between the reconstructions produced by the serial and parallel implementations.

  6. Lessons from the synthetic chemist nature.

    PubMed

    Jürjens, Gerrit; Kirschning, Andreas; Candito, David A

    2015-05-01

    This conceptual review examines the ideal multistep synthesis from the perspective of nature. We suggest that besides step- and redox economies, one other key to efficiency is steady state processing with intermediates that are immediately transformed to the next intermediate when formed. We discuss four of nature's strategies (multicatalysis, domino reactions, iteration and compartmentation) that commonly proceed via short-lived intermediates and show that these strategies are also part of the chemist's portfolio. We particularly focus on compartmentation which in nature is found microscopically within cells (organelles) and between cells and on a molecular level on multiprotein scaffolds (e.g. in polyketide synthases) and demonstrate how compartmentation is manifested in modern multistep flow synthesis.

  7. CAD-Based Shielding Analysis for ITER Port Diagnostics

    NASA Astrophysics Data System (ADS)

    Serikov, Arkady; Fischer, Ulrich; Anthoine, David; Bertalot, Luciano; De Bock, Maartin; O'Connor, Richard; Juarez, Rafael; Krasilnikov, Vitaly

    2017-09-01

    Radiation shielding analysis conducted in support of design development of the contemporary diagnostic systems integrated inside the ITER ports is relied on the use of CAD models. This paper presents the CAD-based MCNP Monte Carlo radiation transport and activation analyses for the Diagnostic Upper and Equatorial Port Plugs (UPP #3 and EPP #8, #17). The creation process of the complicated 3D MCNP models of the diagnostics systems was substantially accelerated by application of the CAD-to-MCNP converter programs MCAM and McCad. High performance computing resources of the Helios supercomputer allowed to speed-up the MCNP parallel transport calculations with the MPI/OpenMP interface. The found shielding solutions could be universal, reducing ports R&D costs. The shield block behind the Tritium and Deposit Monitor (TDM) optical box was added to study its influence on Shut-Down Dose Rate (SDDR) in Port Interspace (PI) of EPP#17. Influence of neutron streaming along the Lost Alpha Monitor (LAM) on the neutron energy spectra calculated in the Tangential Neutron Spectrometer (TNS) of EPP#8. For the UPP#3 with Charge eXchange Recombination Spectroscopy (CXRS-core), an excessive neutron streaming along the CXRS shutter, which should be prevented in further design iteration.

  8. Development of a real-time system for ITER first wall heat load control

    NASA Astrophysics Data System (ADS)

    Anand, Himank; de Vries, Peter; Gribov, Yuri; Pitts, Richard; Snipes, Joseph; Zabeo, Luca

    2017-10-01

    The steady state heat flux on the ITER first wall (FW) panels are limited by the heat removal capacity of the water cooling system. In case of off-normal events (e.g. plasma displacement during H-L transitions), the heat loads are predicted to exceed the design limits (2-4.7 MW/m2). Intense heat loads are predicted on the FW, even well before the burning plasma phase. Thus, a real-time (RT) FW heat load control system is mandatory from early plasma operation of the ITER tokamak. A heat load estimator based on the RT equilibrium reconstruction has been developed for the plasma control system (PCS). A scheme, estimating the energy state for prescribed gaps defined as the distance between the last closed flux surface (LCFS)/separatrix and the FW is presented. The RT energy state is determined by the product of a weighted function of gap distance and the power crossing the plasma boundary. In addition, a heat load estimator assuming a simplified FW geometry and parallel heat transport model in the scrape-off layer (SOL), benchmarked against a full 3-D magnetic field line tracer is also presented.

  9. A finite element solver for 3-D compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Reddy, K. C.; Reddy, J. N.; Nayani, S.

    1990-01-01

    Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.

  10. The ITER ICRF Antenna Design with TOPICA

    NASA Astrophysics Data System (ADS)

    Milanesio, Daniele; Maggiora, Riccardo; Meneghini, Orso; Vecchi, Giuseppe

    2007-11-01

    TOPICA (Torino Polytechnic Ion Cyclotron Antenna) code is an innovative tool for the 3D/1D simulation of Ion Cyclotron Radio Frequency (ICRF), i.e. accounting for antennas in a realistic 3D geometry and with an accurate 1D plasma model [1]. The TOPICA code has been deeply parallelized and has been already proved to be a reliable tool for antennas design and performance prediction. A detailed analysis of the 24 straps ITER ICRF antenna geometry has been carried out, underlining the strong dependence and asymmetries of the antenna input parameters due to the ITER plasma response. We optimized the antenna array geometry dimensions to maximize loading, lower mutual couplings and mitigate sheath effects. The calculated antenna input impedance matrices are TOPICA results of a paramount importance for the tuning and matching system design. Electric field distributions have been also calculated and they are used as the main input for the power flux estimation tool. The designed optimized antenna is capable of coupling 20 MW of power to plasma in the 40 -- 55 MHz frequency range with a maximum voltage of 45 kV in the feeding coaxial cables. [1] V. Lancellotti et al., Nuclear Fusion, 46 (2006) S476-S499

  11. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments

    PubMed Central

    Wolverton, Christopher; Hattrick-Simpers, Jason; Mehta, Apurva

    2018-01-01

    With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method–dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, but there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method–sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path–dependent and that current physiochemical theories find challenging to predict. PMID:29662953

  12. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Fang; Ward, Logan; Williams, Travis

    With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method–dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, butmore » there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method–sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path–dependent and that current physiochemical theories find challenging to predict.« less

  13. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments

    DOE PAGES

    Ren, Fang; Ward, Logan; Williams, Travis; ...

    2018-04-01

    With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method–dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, butmore » there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method–sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path–dependent and that current physiochemical theories find challenging to predict.« less

  14. An improved synthesis of pyran-3,5-dione: application to the synthesis of ABT-598, a potassium channel opener, via Hantzsch reaction.

    PubMed

    Li, Wenke; Wayne, Gregory S; Lallaman, John E; Chang, Sou-Jen; Wittenberger, Steven J

    2006-02-17

    Ketoester 1 is cyclized to give pyran-3,5-dione 2 in 78% yield using a parallel addition of ketoester 1 and base NaO(t)Bu in refluxing THF. Compared to the previously reported procedures, these optimized conditions have significantly increased the yield of this transformation and the quality of pyran 2 and prove to be suitable for large-scale preparation. An application of 2 to the synthesis of ABT-598, a potassium channel opener, is demonstrated.

  15. Further development of a robust workup process for solution-phase high-throughput library synthesis to address environmental and sample tracking issues.

    PubMed

    Kuroda, Noritaka; Hird, Nick; Cork, David G

    2006-01-01

    During further improvement of a high-throughput, solution-phase synthesis system, new workup tools and apparatus for parallel liquid-liquid extraction and evaporation have been developed. A combination of in-house design and collaboration with external manufacturers has been used to address (1) environmental issues concerning solvent emissions and (2) sample tracking errors arising from manual intervention. A parallel liquid-liquid extraction unit, containing miniature high-speed magnetic stirrers for efficient mixing of organic and aqueous phases, has been developed for use on a multichannel liquid handler. Separation of the phases is achieved by dispensing them into a newly patented filter tube containing a vertical hydrophobic porous membrane, which allows only the organic phase to pass into collection vials positioned below. The vertical positioning of the membrane overcomes the hitherto dependence on the use of heavier-than-water, bottom-phase, organic solvents such as dichloromethane, which are restricted due to environmental concerns. Both small (6-mL) and large (60-mL) filter tubes were developed for parallel phase separation in library and template synthesis, respectively. In addition, an apparatus for parallel solvent evaporation was developed to (1) remove solvent from the above samples with highly efficient recovery and (2) avoid the movement of individual samples between their collection on a liquid handler and registration to prevent sample identification errors. The apparatus uses a diaphragm pump to achieve a dynamic circulating closed system with a heating block for the rack of 96 sample vials and an efficient condenser to trap the solvents. Solvent recovery is typically >98%, and convenient operation and monitoring has made the apparatus the first choice for removal of volatile solvents.

  16. Optical systolic array processor using residue arithmetic

    NASA Technical Reports Server (NTRS)

    Jackson, J.; Casasent, D.

    1983-01-01

    The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.

  17. Power deposition on misaligned castellated tungsten blocks in the Magnum-PSI and Pilot-PSI linear devices

    NASA Astrophysics Data System (ADS)

    Morgan, T. W.; van den Berg, M. A.; De Temmerman, G.; Bardin, S.; Aussems, D. U. B.; Pitts, R. A.

    2017-12-01

    For the final design of the ITER divertor it is important to determine whether shaping of each tungsten monoblock to eliminate leading edges is required or not. In order to aid this decision, two experiments were performed in DIFFER’s linear plasma devices to study heat loads on misaligned water cooled blocks at glancing incidence. First, a series of tungsten blocks were exposed to a high parallel heat flux (26 MW \

  18. Research in computer science

    NASA Technical Reports Server (NTRS)

    Ortega, J. M.

    1985-01-01

    Synopses are given for NASA supported work in computer science at the University of Virginia. Some areas of research include: error seeding as a testing method; knowledge representation for engineering design; analysis of faults in a multi-version software experiment; implementation of a parallel programming environment; two computer graphics systems for visualization of pressure distribution and convective density particles; task decomposition for multiple robot arms; vectorized incomplete conjugate gradient; and iterative methods for solving linear equations on the Flex/32.

  19. Exploiting Data Sparsity in Parallel Matrix Powers Computations

    DTIC Science & Technology

    2013-05-03

    2013 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour...matrices of the form A = D+USV H, where D is sparse and USV H has low rank but may be dense. Matrices of this form arise in many practical applications...methods numerical partial di erential equation solvers, and preconditioned iterative methods. If A has this form , our algorithm enables a communication

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chow, Edmond

    Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.

  1. Calculation of Moment Matrix Elements for Bilinear Quadrilaterals and Higher-Order Basis Functions

    DTIC Science & Technology

    2016-01-06

    methods are known as boundary integral equation (BIE) methods and the present study falls into this category. The numerical solution of the BIE is...iterated integrals. The inner integral involves the product of the free-space Green’s function for the Helmholtz equation multiplied by an appropriate...Website: http://www.wipl-d.com/ 5. Y. Zhang and T. K. Sarkar, Parallel Solution of Integral Equation -Based EM Problems in the Frequency Domain. New

  2. Synthesis of most polyene natural product motifs using just 12 building blocks and one coupling reaction.

    PubMed

    Woerly, Eric M; Roy, Jahnabi; Burke, Martin D

    2014-06-01

    The inherent modularity of polypeptides, oligonucleotides and oligosaccharides has been harnessed to achieve generalized synthesis platforms. Importantly, like these other targets, most small-molecule natural products are biosynthesized via iterative coupling of bifunctional building blocks. This suggests that many small molecules also possess inherent modularity commensurate with systematic building block-based construction. Supporting this hypothesis, here we report that the polyene motifs found in >75% of all known polyene natural products can be synthesized using just 12 building blocks and one coupling reaction. Using the same general retrosynthetic algorithm and reaction conditions, this platform enabled both the synthesis of a wide range of polyene frameworks that covered all of this natural-product chemical space and the first total syntheses of the polyene natural products asnipyrone B, physarigin A and neurosporaxanthin b-D-glucopyranoside. Collectively, these results suggest the potential for a more generalized approach to making small molecules in the laboratory.

  3. Synthesis of most polyene natural product motifs using just 12 building blocks and one coupling reaction

    NASA Astrophysics Data System (ADS)

    Woerly, Eric M.; Roy, Jahnabi; Burke, Martin D.

    2014-06-01

    The inherent modularity of polypeptides, oligonucleotides and oligosaccharides has been harnessed to achieve generalized synthesis platforms. Importantly, like these other targets, most small-molecule natural products are biosynthesized via iterative coupling of bifunctional building blocks. This suggests that many small molecules also possess inherent modularity commensurate with systematic building block-based construction. Supporting this hypothesis, here we report that the polyene motifs found in >75% of all known polyene natural products can be synthesized using just 12 building blocks and one coupling reaction. Using the same general retrosynthetic algorithm and reaction conditions, this platform enabled both the synthesis of a wide range of polyene frameworks that covered all of this natural-product chemical space and the first total syntheses of the polyene natural products asnipyrone B, physarigin A and neurosporaxanthin β-D-glucopyranoside. Collectively, these results suggest the potential for a more generalized approach to making small molecules in the laboratory.

  4. Resources for global risk assessment: the International Toxicity Estimates for Risk (ITER) and Risk Information Exchange (RiskIE) databases.

    PubMed

    Wullenweber, Andrea; Kroner, Oliver; Kohrman, Melissa; Maier, Andrew; Dourson, Michael; Rak, Andrew; Wexler, Philip; Tomljanovic, Chuck

    2008-11-15

    The rate of chemical synthesis and use has outpaced the development of risk values and the resolution of risk assessment methodology questions. In addition, available risk values derived by different organizations may vary due to scientific judgments, mission of the organization, or use of more recently published data. Further, each organization derives values for a unique chemical list so it can be challenging to locate data on a given chemical. Two Internet resources are available to address these issues. First, the International Toxicity Estimates for Risk (ITER) database (www.tera.org/iter) provides chronic human health risk assessment data from a variety of organizations worldwide in a side-by-side format, explains differences in risk values derived by different organizations, and links directly to each organization's website for more detailed information. It is also the only database that includes risk information from independent parties whose risk values have undergone independent peer review. Second, the Risk Information Exchange (RiskIE) is a database of in progress chemical risk assessment work, and includes non-chemical information related to human health risk assessment, such as training modules, white papers and risk documents. RiskIE is available at http://www.allianceforrisk.org/RiskIE.htm, and will join ITER on National Library of Medicine's TOXNET (http://toxnet.nlm.nih.gov/). Together, ITER and RiskIE provide risk assessors essential tools for easily identifying and comparing available risk data, for sharing in progress assessments, and for enhancing interaction among risk assessment groups to decrease duplication of effort and to harmonize risk assessment procedures across organizations.

  5. An Efficient Algorithm for Perturbed Orbit Integration Combining Analytical Continuation and Modified Chebyshev Picard Iteration

    NASA Astrophysics Data System (ADS)

    Elgohary, T.; Kim, D.; Turner, J.; Junkins, J.

    2014-09-01

    Several methods exist for integrating the motion in high order gravity fields. Some recent methods use an approximate starting orbit, and an efficient method is needed for generating warm starts that account for specific low order gravity approximations. By introducing two scalar Lagrange-like invariants and employing Leibniz product rule, the perturbed motion is integrated by a novel recursive formulation. The Lagrange-like invariants allow exact arbitrary order time derivatives. Restricting attention to the perturbations due to the zonal harmonics J2 through J6, we illustrate an idea. The recursively generated vector-valued time derivatives for the trajectory are used to develop a continuation series-based solution for propagating position and velocity. Numerical comparisons indicate performance improvements of ~ 70X over existing explicit Runge-Kutta methods while maintaining mm accuracy for the orbit predictions. The Modified Chebyshev Picard Iteration (MCPI) is an iterative path approximation method to solve nonlinear ordinary differential equations. The MCPI utilizes Picard iteration with orthogonal Chebyshev polynomial basis functions to recursively update the states. The key advantages of the MCPI are as follows: 1) Large segments of a trajectory can be approximated by evaluating the forcing function at multiple nodes along the current approximation during each iteration. 2) It can readily handle general gravity perturbations as well as non-conservative forces. 3) Parallel applications are possible. The Picard sequence converges to the solution over large time intervals when the forces are continuous and differentiable. According to the accuracy of the starting solutions, however, the MCPI may require significant number of iterations and function evaluations compared to other integrators. In this work, we provide an efficient methodology to establish good starting solutions from the continuation series method; this warm start improves the performance of the MCPI significantly and will likely be useful for other applications where efficiently computed approximate orbit solutions are needed.

  6. Development of a mirror-based endoscope for divertor spectroscopy on JET with the new ITER-like wall (invited).

    PubMed

    Huber, A; Brezinsek, S; Mertens, Ph; Schweer, B; Sergienko, G; Terra, A; Arnoux, G; Balshaw, N; Clever, M; Edlingdon, T; Egner, S; Farthing, J; Hartl, M; Horton, L; Kampf, D; Klammer, J; Lambertz, H T; Matthews, G F; Morlock, C; Murari, A; Reindl, M; Riccardo, V; Samm, U; Sanders, S; Stamp, M; Williams, J; Zastrow, K D; Zauner, C

    2012-10-01

    A new endoscope with optimised divertor view has been developed in order to survey and monitor the emission of specific impurities such as tungsten and the remaining carbon as well as beryllium in the tungsten divertor of JET after the implementation of the ITER-like wall in 2011. The endoscope is a prototype for testing an ITER relevant design concept based on reflective optics only. It may be subject to high neutron fluxes as expected in ITER. The operating wavelength range, from 390 nm to 2500 nm, allows the measurements of the emission of all expected impurities (W I, Be II, C I, C II, C III) with high optical transmittance (≥ 30% in the designed wavelength range) as well as high spatial resolution that is ≤ 2 mm at the object plane and ≤ 3 mm for the full depth of field (± 0.7 m). The new optical design includes options for in situ calibration of the endoscope transmittance during the experimental campaign, which allows the continuous tracing of possible transmittance degradation with time due to impurity deposition and erosion by fast neutral particles. In parallel to the new optical design, a new type of possibly ITER relevant shutter system based on pneumatic techniques has been developed and integrated into the endoscope head. The endoscope is equipped with four digital CCD cameras, each combined with two filter wheels for narrow band interference and neutral density filters. Additionally, two protection cameras in the λ > 0.95 μm range have been integrated in the optical design for the real time wall protection during the plasma operation of JET.

  7. Low pressure and high power rf sources for negative hydrogen ions for fusion applications (ITER neutral beam injection).

    PubMed

    Fantz, U; Franzen, P; Kraus, W; Falter, H D; Berger, M; Christ-Koch, S; Fröschle, M; Gutser, R; Heinemann, B; Martens, C; McNeely, P; Riedl, R; Speth, E; Wünderlich, D

    2008-02-01

    The international fusion experiment ITER requires for the plasma heating and current drive a neutral beam injection system based on negative hydrogen ion sources at 0.3 Pa. The ion source must deliver a current of 40 A D(-) for up to 1 h with an accelerated current density of 200 Am/(2) and a ratio of coextracted electrons to ions below 1. The extraction area is 0.2 m(2) from an aperture array with an envelope of 1.5 x 0.6 m(2). A high power rf-driven negative ion source has been successfully developed at the Max-Planck Institute for Plasma Physics (IPP) at three test facilities in parallel. Current densities of 330 and 230 Am/(2) have been achieved for hydrogen and deuterium, respectively, at a pressure of 0.3 Pa and an electron/ion ratio below 1 for a small extraction area (0.007 m(2)) and short pulses (<4 s). In the long pulse experiment, equipped with an extraction area of 0.02 m(2), the pulse length has been extended to 3600 s. A large rf source, with the width and half the height of the ITER source but without extraction system, is intended to demonstrate the size scaling and plasma homogeneity of rf ion sources. The source operates routinely now. First results on plasma homogeneity obtained from optical emission spectroscopy and Langmuir probes are very promising. Based on the success of the IPP development program, the high power rf-driven negative ion source has been chosen recently for the ITER beam systems in the ITER design review process.

  8. Advanced density profile reflectometry; the state-of-the-art and measurement prospects for ITER

    NASA Astrophysics Data System (ADS)

    Doyle, E. J.

    2006-10-01

    Dramatic progress in millimeter-wave technology has allowed the realization of a key goal for ITER diagnostics, the routine measurement of the plasma density profile from millimeter-wave radar (reflectometry) measurements. In reflectometry, the measured round-trip group delay of a probe beam reflected from a plasma cutoff is used to infer the density distribution in the plasma. Reflectometer systems implemented by UCLA on a number of devices employ frequency-modulated continuous-wave (FM-CW), ultrawide-bandwidth, high-resolution radar systems. One such system on DIII-D has routinely demonstrated measurements of the density profile over a range of electron density of 0-6.4x10^19,m-3, with ˜25 μs time and ˜4 mm radial resolution, meeting key ITER requirements. This progress in performance was made possible by multiple advances in the areas of millimeter-wave technology, novel measurement techniques, and improved understanding, including: (i) fast sweep, solid-state, wide bandwidth sources and power amplifiers, (ii) dual polarization measurements to expand the density range, (iii) adaptive radar-based data analysis with parallel processing on a Unix cluster, (iv) high memory depth data acquisition, and (v) advances in full wave code modeling. The benefits of advanced system performance will be illustrated using measurements from a wide range of phenomena, including ELM and fast-ion driven mode dynamics, L-H transition studies and plasma-wall interaction. The measurement capabilities demonstrated by these systems provide a design basis for the development of the main ITER profile reflectometer system. This talk will explore the extent to which these reflectometer system designs, results and experience can be translated to ITER, and will identify what new studies and experimental tests are essential.

  9. Single-step reinitialization and extending algorithms for level-set based multi-phase flow simulations

    NASA Astrophysics Data System (ADS)

    Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.

    2017-12-01

    We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.

  10. Development of a mirror-based endoscope for divertor spectroscopy on JET with the new ITER-like wall (invited)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, A.; Brezinsek, S.; Mertens, Ph.

    2012-10-15

    A new endoscope with optimised divertor view has been developed in order to survey and monitor the emission of specific impurities such as tungsten and the remaining carbon as well as beryllium in the tungsten divertor of JET after the implementation of the ITER-like wall in 2011. The endoscope is a prototype for testing an ITER relevant design concept based on reflective optics only. It may be subject to high neutron fluxes as expected in ITER. The operating wavelength range, from 390 nm to 2500 nm, allows the measurements of the emission of all expected impurities (W I, Be II,more » C I, C II, C III) with high optical transmittance ({>=}30% in the designed wavelength range) as well as high spatial resolution that is {<=}2 mm at the object plane and {<=}3 mm for the full depth of field ({+-}0.7 m). The new optical design includes options for in situ calibration of the endoscope transmittance during the experimental campaign, which allows the continuous tracing of possible transmittance degradation with time due to impurity deposition and erosion by fast neutral particles. In parallel to the new optical design, a new type of possibly ITER relevant shutter system based on pneumatic techniques has been developed and integrated into the endoscope head. The endoscope is equipped with four digital CCD cameras, each combined with two filter wheels for narrow band interference and neutral density filters. Additionally, two protection cameras in the {lambda} > 0.95 {mu}m range have been integrated in the optical design for the real time wall protection during the plasma operation of JET.« less

  11. Specialized Computer Systems for Environment Visualization

    NASA Astrophysics Data System (ADS)

    Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.

    2018-06-01

    The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.

  12. Design and synthesis of diverse functional kinked nanowire structures for nanoelectronic bioprobes.

    PubMed

    Xu, Lin; Jiang, Zhe; Qing, Quan; Mai, Liqiang; Zhang, Qingjie; Lieber, Charles M

    2013-02-13

    Functional kinked nanowires (KNWs) represent a new class of nanowire building blocks, in which functional devices, for example, nanoscale field-effect transistors (nanoFETs), are encoded in geometrically controlled nanowire superstructures during synthesis. The bottom-up control of both structure and function of KNWs enables construction of spatially isolated point-like nanoelectronic probes that are especially useful for monitoring biological systems where finely tuned feature size and structure are highly desired. Here we present three new types of functional KNWs including (1) the zero-degree KNW structures with two parallel heavily doped arms of U-shaped structures with a nanoFET at the tip of the "U", (2) series multiplexed functional KNW integrating multi-nanoFETs along the arm and at the tips of V-shaped structures, and (3) parallel multiplexed KNWs integrating nanoFETs at the two tips of W-shaped structures. First, U-shaped KNWs were synthesized with separations as small as 650 nm between the parallel arms and used to fabricate three-dimensional nanoFET probes at least 3 times smaller than previous V-shaped designs. In addition, multiple nanoFETs were encoded during synthesis in one of the arms/tip of V-shaped and distinct arms/tips of W-shaped KNWs. These new multiplexed KNW structures were structurally verified by optical and electron microscopy of dopant-selective etched samples and electrically characterized using scanning gate microscopy and transport measurements. The facile design and bottom-up synthesis of these diverse functional KNWs provides a growing toolbox of building blocks for fabricating highly compact and multiplexed three-dimensional nanoprobes for applications in life sciences, including intracellular and deep tissue/cell recordings.

  13. Error Control Coding Techniques for Space and Satellite Communications

    NASA Technical Reports Server (NTRS)

    Costello, Daniel J., Jr.; Takeshita, Oscar Y.; Cabral, Hermano A.

    1998-01-01

    It is well known that the BER performance of a parallel concatenated turbo-code improves roughly as 1/N, where N is the information block length. However, it has been observed by Benedetto and Montorsi that for most parallel concatenated turbo-codes, the FER performance does not improve monotonically with N. In this report, we study the FER of turbo-codes, and the effects of their concatenation with an outer code. Two methods of concatenation are investigated: across several frames and within each frame. Some asymmetric codes are shown to have excellent FER performance with an information block length of 16384. We also show that the proposed outer coding schemes can improve the BER performance as well by eliminating pathological frames generated by the iterative MAP decoding process.

  14. Distributed Optimal Power Flow of AC/DC Interconnected Power Grid Using Synchronous ADMM

    NASA Astrophysics Data System (ADS)

    Liang, Zijun; Lin, Shunjiang; Liu, Mingbo

    2017-05-01

    Distributed optimal power flow (OPF) is of great importance and challenge to AC/DC interconnected power grid with different dispatching centres, considering the security and privacy of information transmission. In this paper, a fully distributed algorithm for OPF problem of AC/DC interconnected power grid called synchronous ADMM is proposed, and it requires no form of central controller. The algorithm is based on the fundamental alternating direction multiplier method (ADMM), by using the average value of boundary variables of adjacent regions obtained from current iteration as the reference values of both regions for next iteration, which realizes the parallel computation among different regions. The algorithm is tested with the IEEE 11-bus AC/DC interconnected power grid, and by comparing the results with centralized algorithm, we find it nearly no differences, and its correctness and effectiveness can be validated.

  15. Present limits and improvements of structural materials for fusion reactors - a review

    NASA Astrophysics Data System (ADS)

    Tavassoli, A.-A. F.

    2002-04-01

    Since the transition from ITER or DEMO to a commercial power reactor would involve a significant change in system and materials options, a parallel R&D path has been put in place in Europe to address these issues. This paper assesses the structural materials part of this program along with the latest R&D results from the main programs. It is shown that stainless steels and ferritic/martensitic steels, retained for ITER and DEMO, will also remain the principal contenders for the future FPR, despite uncertainties over irradiation induced embrittlement at low temperatures and consequences of high He/dpa ratio. Neither one of the present advanced high temperature materials has to this date the structural integrity reliability needed for application in critical components. This situation is unlikely to change with the materials R&D alone and has to be mitigated in close collaboration with blanket system design.

  16. Beam ion acceleration by ICRH in JET discharges

    NASA Astrophysics Data System (ADS)

    Budny, R. V.; Gorelenkova, M.; Bertelli, N.; JET Collaboration

    2015-11-01

    The ion Monte-Carlo orbit integrator NUBEAM, used in TRANSP has been enhanced to include an ``RF-kick'' operator to simulate the interaction of RF fields and fast ions. The RF quasi-linear operator (localized in space) uses a second R-Z orbit integrator. We apply this to analysis of recent JET discharges using ICRH with the ITER-like first wall. An example of results for a high performance Hybrid discharge for which standard TRANSP analysis simulated the DD neutron emission rate below measurements, re-analysis using the RF-kick operator results in increased beam parallel and perpendicular energy densities (~=40% and 15% respectively), and increased beam-thermal neutron emission (~= 35%), making the total rate closer to the measurement. Checks of the numerics, comparisons with measurements, and ITER implications will be presented. Supported in part by the US DoE contract DE-AC02-09CH11466 and by EUROfusion No 633053.

  17. Dynamic phasing of multichannel cw laser radiation by means of a stochastic gradient algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Volkov, V A; Volkov, M V; Garanin, S G

    2013-09-30

    The phasing of a multichannel laser beam by means of an iterative stochastic parallel gradient (SPG) algorithm has been numerically and experimentally investigated. The operation of the SPG algorithm is simulated, the acceptable range of amplitudes of probe phase shifts is found, and the algorithm parameters at which the desired Strehl number can be obtained with a minimum number of iterations are determined. An experimental bench with phase modulators based on lithium niobate, which are controlled by a multichannel electronic unit with a real-time microcontroller, has been designed. Phasing of 16 cw laser beams at a system response bandwidth ofmore » 3.7 kHz and phase thermal distortions in a frequency band of about 10 Hz is experimentally demonstrated. The experimental data are in complete agreement with the calculation results. (control of laser radiation parameters)« less

  18. A high data rate universal lattice decoder on FPGA

    NASA Astrophysics Data System (ADS)

    Ma, Jing; Huang, Xinming; Kura, Swapna

    2005-06-01

    This paper presents the architecture design of a high data rate universal lattice decoder for MIMO channels on FPGA platform. A phost strategy based lattice decoding algorithm is modified in this paper to reduce the complexity of the closest lattice point search. The data dependency of the improved algorithm is examined and a parallel and pipeline architecture is developed with the iterative decoding function on FPGA and the division intensive channel matrix preprocessing on DSP. Simulation results demonstrate that the improved lattice decoding algorithm provides better bit error rate and less iteration number compared with the original algorithm. The system prototype of the decoder shows that it supports data rate up to 7Mbit/s on a Virtex2-1000 FPGA, which is about 8 times faster than the original algorithm on FPGA platform and two-orders of magnitude better than its implementation on a DSP platform.

  19. A 3D finite-difference BiCG iterative solver with the Fourier-Jacobi preconditioner for the anisotropic EIT/EEG forward problem.

    PubMed

    Turovets, Sergei; Volkov, Vasily; Zherdetsky, Aleksej; Prakonina, Alena; Malony, Allen D

    2014-01-01

    The Electrical Impedance Tomography (EIT) and electroencephalography (EEG) forward problems in anisotropic inhomogeneous media like the human head belongs to the class of the three-dimensional boundary value problems for elliptic equations with mixed derivatives. We introduce and explore the performance of several new promising numerical techniques, which seem to be more suitable for solving these problems. The proposed numerical schemes combine the fictitious domain approach together with the finite-difference method and the optimally preconditioned Conjugate Gradient- (CG-) type iterative method for treatment of the discrete model. The numerical scheme includes the standard operations of summation and multiplication of sparse matrices and vector, as well as FFT, making it easy to implement and eligible for the effective parallel implementation. Some typical use cases for the EIT/EEG problems are considered demonstrating high efficiency of the proposed numerical technique.

  20. Weapon System Costing Methodology for Aircraft Airframes and Basic Structures. Volume I. Technical Volume

    DTIC Science & Technology

    1975-06-01

    the Air Force Flight Dynamics Laboratory for use in conceptual and preliminary designs pauses of weapon system development. The methods are a...trade study method provides ai\\ iterative capability stemming from a direct interface with design synthesis programs. A detailed cost data base ;ind...system for data expmjsion is provided. The methods are designed for ease in changing cost estimating relationships and estimating coefficients

  1. Program Aids Analysis And Optimization Of Design

    NASA Technical Reports Server (NTRS)

    Rogers, James L., Jr.; Lamarsh, William J., II

    1994-01-01

    NETS/ PROSSS (NETS Coupled With Programming System for Structural Synthesis) computer program developed to provide system for combining NETS (MSC-21588), neural-network application program and CONMIN (Constrained Function Minimization, ARC-10836), optimization program. Enables user to reach nearly optimal design. Design then used as starting point in normal optimization process, possibly enabling user to converge to optimal solution in significantly fewer iterations. NEWT/PROSSS written in C language and FORTRAN 77.

  2. Saturated mutagenesis of ketoisovalerate decarboxylase V461 enabled specific synthesis of 1-pentanol via the ketoacid elongation cycle.

    PubMed

    Chen, Grey S; Siao, Siang Wun; Shen, Claire R

    2017-09-12

    Iterative ketoacid elongation has been an essential tool in engineering artificial metabolism, in particular the synthetic alcohols. However, precise control of product specificity is still greatly challenged by the substrate promiscuity of the ketoacid decarboxylase, which unselectively hijacks ketoacid intermediates from the elongation cycle along with the target ketoacid. In this work, preferential tuning of the Lactococcus lactis ketoisovalerate decarboxylase (Kivd) specificity toward 1-pentanol synthesis was achieved via saturated mutagenesis of the key residue V461 followed by screening of the resulting alcohol spectrum. Substitution of V461 with the small and polar amino acid glycine or serine significantly improved the Kivd selectivity toward the 1-pentanol precursor 2-ketocaproate by lowering its catalytic efficiency for the upstream ketoacid 2-ketobutyrate and 2-ketovalerate. Conversely, replacing V461 with bulky or charged side chains displayed severely adverse effect. Increasing supply of the iterative addition unit acetyl-CoA by acetate feeding further drove 2-ketoacid flux into the elongation cycle and enhanced 1-pentanol productivity. The Kivd V461G variant enabled a 1-pentanol production specificity around 90% of the total alcohol content with or without oleyl alcohol extraction. This work adds insight to the selectivity of Kivd active site.

  3. Scalable and balanced dynamic hybrid data assimilation

    NASA Astrophysics Data System (ADS)

    Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

    2017-04-01

    Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.

  4. Compensation for spectacle lenses involves changes in proteoglycan synthesis in both the sclera and choroid.

    PubMed

    Nickla, D L; Wildsoet, C; Wallman, J

    1997-04-01

    It has been demonstrated that chick eye growth compensates for defocus imposed by spectacle lenses: the eye elongates in response to hyperopic defocus imposed by negative lenses and slows its elongation in response to myopic defocus imposed by positive lenses. We ask whether the synthesis of scleral extracellular matrix, specifically glycosaminoglycans, changes in parallel with the changes in ocular elongation. In addition, there is a choroidal component to compensation for spectacle lenses; the choroid thickens in response to myopic defocus and thins in response to hyperopic defocus. We ask whether choroidal glycosaminoglycan synthesis changes in parallel with changes in choroidal thickness. Chicks wore either a +15 diopter (D) or -15 D spectacle lens over one eye, or they wore one lens of each power over each eye for 5 days. At the end of this period, we measured refractive errors and ocular dimensions by refractometry and A-scan ultrasonography, respectively. Pieces of the scleras and choroids from these eyes were put into culture and the synthesis of glycosaminoglycans was assessed by measuring the incorporation of radioactive inorganic sulfur. We here report that the compensatory modulation of the length of the eye involves changes in the synthesis of glycosaminoglycans in the sclera, with synthesis increasing in eyes wearing -15 D spectacles lenses and decreasing in eyes wearing +15 D lenses. In addition, changes in the synthesis of glycosaminoglycans in the choroid are correlated with changes in choroidal thickness: eyes wearing +15 D lenses develop thicker choroids and these choroids synthesize more glycosaminoglycans than choroids from eyes wearing -15 D lenses. Changes in scleral glycosaminoglycan synthesis accompany lens-induced changes in the length of the eye. Furthermore, changes in the thickness of the choroid are also associated with changes in the synthesis of glycosaminoglycans. These results are consistent with the regulation of the growth of the eye being bidirectional, and with the retina being able to sense the sign of defocus.

  5. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A., E-mail: anastasio@wustl.edu

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that ismore » solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.« less

  6. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction.

    PubMed

    Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A

    2016-04-01

    The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.

  7. Accelerated fast iterative shrinkage thresholding algorithms for sparsity-regularized cone-beam CT image reconstruction

    PubMed Central

    Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.

    2016-01-01

    Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets. PMID:27036582

  8. SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Palamuttam, R. S.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; Verma, R.; Waliser, D. E.; Lee, H.

    2015-12-01

    Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 10 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. We have implemented a parallel data ingest capability in which the user specifies desired variables (arrays) as several time-sorted lists of URL's (i.e. using OPeNDAP model.nc?varname, or local files). The specified variables are partitioned by time/space and then each Spark node pulls its bundle of arrays into memory to begin a computation pipeline. We also investigated the performance of several N-dim. array libraries (scala breeze, java jblas & netlib-java, and ND4J). We are currently developing science codes using ND4J and studying memory behavior on the JVM. On the pyspark side, many of our science codes already use the numpy and SciPy ecosystems. The talk will cover: the architecture of SciSpark, the design of the scientific RDD (sRDD) data structure, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.

  9. Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  10. Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  11. Rapid determination of enantiomeric excess: a focus on optical approaches.

    PubMed

    Leung, Diana; Kang, Sung Ok; Anslyn, Eric V

    2012-01-07

    High-throughput screening (HTS) methods are becoming increasingly essential in discovering chiral catalysts or auxiliaries for asymmetric transformations due to the advent of parallel synthesis and combinatorial chemistry. Both parallel synthesis and combinatorial chemistry can lead to the exploration of a range of structural candidates and reaction conditions as a means to obtain the highest enantiomeric excess (ee) of a desired transformation. One current bottleneck in these approaches to asymmetric reactions is the determination of ee, which has led researchers to explore a wide range of HTS techniques. To be truly high-throughput, it has been proposed that a technique that can analyse a thousand or more samples per day is needed. Many of the current approaches to this goal are based on optical methods because they allow for a rapid determination of ee due to quick data collection and their parallel analysis capabilities. In this critical review these techniques are reviewed with a discussion of their respective advantages and drawbacks, and with a contrast to chromatographic methods (180 references). This journal is © The Royal Society of Chemistry 2012

  12. A Fusion Nuclear Science Facility for a fast-track path to DEMO

    DOE PAGES

    Garofalo, Andrea M.; Abdou, M.; Canik, John M.; ...

    2014-10-01

    An accelerated fusion energy development program, a “fast-track” approach, requires developing an understanding of fusion nuclear science (FNS) in parallel with research on ITER to study burning plasmas. A Fusion Nuclear Science Facility (FNSF) in parallel with ITER provides the capability to resolve FNS feasibility issues related to power extraction, tritium fuel sustainability, and reliability, and to begin construction of DEMO upon the achievement of Q~10 in ITER. Fusion nuclear components, including the first wall (FW)/blanket, divertor, heating/fueling systems, etc. are complex systems with many inter-related functions and different materials, fluids, and physical interfaces. These in-vessel nuclear components must operatemore » continuously and reliably with: (a) Plasma exposure, surface particle & radiation loads, (b) High energy 2 neutron fluxes and their interactions in materials (e.g. peaked volumetric heating with steep gradients, tritium production, activation, atomic displacements, gas production, etc.), (c) Strong magnetic fields with temporal and spatial variations (electromagnetic coupling to the plasma including off-normal events like disruptions), and (d) a High temperature, high vacuum, chemically active environment. While many of these conditions and effects are being studied with separate and multiple effect experimental test stands and modeling, fusion nuclear conditions cannot be completely simulated outside the fusion environment. This means there are many new multi-physics, multi-scale phenomena and synergistic effects yet to be discovered and accounted for in the understanding, design and operation of fusion as a self-sustaining, energy producing system, and significant experimentation and operational experience in a true fusion environment is an essential requirement. In the following sections we discuss the FNSF objectives, describe the facility requirements and a facility concept and operation approach that can accomplish those objectives, and assess the readiness to construct with respect to several key FNSF issues: materials, steady-state operation, disruptions, power exhaust, and breeding blanket. Finally we present our conclusions.« less

  13. Project APhiD: A Lorenz-gauged A-Φ decomposition for parallelized computation of ultra-broadband electromagnetic induction in a fully heterogeneous Earth

    NASA Astrophysics Data System (ADS)

    Weiss, Chester J.

    2013-08-01

    An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.

  14. Study of noise propagation and the effects of insufficient numbers of projection angles and detector samplings for iterative reconstruction using planar-integral data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, B.; Zeng, G. L.

    2006-09-15

    A rotating slat collimator can be used to acquire planar-integral data. It achieves higher geometric efficiency than a parallel-hole collimator by accepting more photons, but the planar-integral data contain less tomographic information that may result in larger noise amplification in the reconstruction. Lodge evaluated the rotating slat system and the parallel-hole system based on noise behavior for an FBP reconstruction. Here, we evaluate the noise propagation properties of the two collimation systems for iterative reconstruction. We extend Huesman's noise propagation analysis of the line-integral system to the planar-integral case, and show that approximately 2.0(D/dp) SPECT angles, 2.5(D/dp) self-spinning angles atmore » each detector position, and a 0.5dp detector sampling interval are required in order for the planar-integral data to be efficiently utilized. Here, D is the diameter of the object and dp is the linear dimension of the voxels that subdivide the object. The noise propagation behaviors of the two systems are then compared based on a least-square reconstruction using the ratio of the SNR in the image reconstructed using a planar-integral system to that reconstructed using a line-integral system. The ratio is found to be proportional to {radical}(F/D), where F is a geometric efficiency factor. This result has been verified by computer simulations. It confirms that for an iterative reconstruction, the noise tradeoff of the two systems is not only dependent on the increase of the geometric efficiency afforded by the planar projection method, but also dependent on the size of the object. The planar-integral system works better for small objects, while the line-integral system performs better for large ones. This result is consistent with Lodge's results based on the FBP method.« less

  15. Synthesis of a drug-like focused library of trisubstituted pyrrolidines using integrated flow chemistry and batch methods.

    PubMed

    Baumann, Marcus; Baxendale, Ian R; Kuratli, Christoph; Ley, Steven V; Martin, Rainer E; Schneider, Josef

    2011-07-11

    A combination of flow and batch chemistries has been successfully applied to the assembly of a series of trisubstituted drug-like pyrrolidines. This study demonstrates the efficient preparation of a focused library of these pharmaceutically important structures using microreactor technologies, as well as classical parallel synthesis techniques, and thus exemplifies the impact of integrating innovative enabling tools within the drug discovery process.

  16. Robust lateral blended-wing-body aircraft feedback control design using a parameterized LFR model and DGK-iteration

    NASA Astrophysics Data System (ADS)

    Schirrer, A.; Westermayer, C.; Hemedi, M.; Kozek, M.

    2013-12-01

    This paper shows control design results, performance, and limitations of robust lateral control law designs based on the DGK-iteration mixed-μ-synthesis procedure for a large, flexible blended wing body (BWB) passenger aircraft. The aircraft dynamics is preshaped by a low-complexity inner loop control law providing stabilization, basic response shaping, and flexible mode damping. The μ controllers are designed to further improve vibration damping of the main flexible modes by exploiting the structure of the arising significant parameter-dependent plant variations. This is achieved by utilizing parameterized Linear Fractional Representations (LFR) of the aircraft rigid and flexible dynamics. Designs with various levels of LFR complexity are carried out and discussed, showing the achieved performance improvement over the initial controller and their robustness and complexity properties.

  17. Reconstruction of multiple-pinhole micro-SPECT data using origin ensembles.

    PubMed

    Lyon, Morgan C; Sitek, Arkadiusz; Metzler, Scott D; Moore, Stephen C

    2016-10-01

    The authors are currently developing a dual-resolution multiple-pinhole microSPECT imaging system based on three large NaI(Tl) gamma cameras. Two multiple-pinhole tungsten collimator tubes will be used sequentially for whole-body "scout" imaging of a mouse, followed by high-resolution (hi-res) imaging of an organ of interest, such as the heart or brain. Ideally, the whole-body image will be reconstructed in real time such that data need only be acquired until the area of interest can be visualized well-enough to determine positioning for the hi-res scan. The authors investigated the utility of the origin ensemble (OE) algorithm for online and offline reconstructions of the scout data. This algorithm operates directly in image space, and can provide estimates of image uncertainty, along with reconstructed images. Techniques for accelerating the OE reconstruction were also introduced and evaluated. System matrices were calculated for our 39-pinhole scout collimator design. SPECT projections were simulated for a range of count levels using the MOBY digital mouse phantom. Simulated data were used for a comparison of OE and maximum-likelihood expectation maximization (MLEM) reconstructions. The OE algorithm convergence was evaluated by calculating the total-image entropy and by measuring the counts in a volume-of-interest (VOI) containing the heart. Total-image entropy was also calculated for simulated MOBY data reconstructed using OE with various levels of parallelization. For VOI measurements in the heart, liver, bladder, and soft-tissue, MLEM and OE reconstructed images agreed within 6%. Image entropy converged after ∼2000 iterations of OE, while the counts in the heart converged earlier at ∼200 iterations of OE. An accelerated version of OE completed 1000 iterations in <9 min for a 6.8M count data set, with some loss of image entropy performance, whereas the same dataset required ∼79 min to complete 1000 iterations of conventional OE. A combination of the two methods showed decreased reconstruction time and no loss of performance when compared to conventional OE alone. OE-reconstructed images were found to be quantitatively and qualitatively similar to MLEM, yet OE also provided estimates of image uncertainty. Some acceleration of the reconstruction can be gained through the use of parallel computing. The OE algorithm is useful for reconstructing multiple-pinhole SPECT data and can be easily modified for real-time reconstruction.

  18. Computer Synthesis Approaches of Hyperboloid Gear Drives with Linear Contact

    NASA Astrophysics Data System (ADS)

    Abadjiev, Valentin; Kawasaki, Haruhisa

    2014-09-01

    The computer design has improved forming different type software for scientific researches in the field of gearing theory as well as performing an adequate scientific support of the gear drives manufacture. Here are attached computer programs that are based on mathematical models as a result of scientific researches. The modern gear transmissions require the construction of new mathematical approaches to their geometric, technological and strength analysis. The process of optimization, synthesis and design is based on adequate iteration procedures to find out an optimal solution by varying definite parameters. The study is dedicated to accepted methodology in the creation of soft- ware for the synthesis of a class high reduction hyperboloid gears - Spiroid and Helicon ones (Spiroid and Helicon are trademarks registered by the Illinois Tool Works, Chicago, Ill). The developed basic computer products belong to software, based on original mathematical models. They are based on the two mathematical models for the synthesis: "upon a pitch contact point" and "upon a mesh region". Computer programs are worked out on the basis of the described mathematical models, and the relations between them are shown. The application of the shown approaches to the synthesis of commented gear drives is illustrated.

  19. Exact Synthesis of Reversible Circuits Using A* Algorithm

    NASA Astrophysics Data System (ADS)

    Datta, K.; Rathi, G. K.; Sengupta, I.; Rahaman, H.

    2015-06-01

    With the growing emphasis on low-power design methodologies, and the result that theoretical zero power dissipation is possible only if computations are information lossless, design and synthesis of reversible logic circuits have become very important in recent years. Reversible logic circuits are also important in the context of quantum computing, where the basic operations are reversible in nature. Several synthesis methodologies for reversible circuits have been reported. Some of these methods are termed as exact, where the motivation is to get the minimum-gate realization for a given reversible function. These methods are computationally very intensive, and are able to synthesize only very small functions. There are other methods based on function transformations or higher-level representation of functions like binary decision diagrams or exclusive-or sum-of-products, that are able to handle much larger circuits without any guarantee of optimality or near-optimality. Design of exact synthesis algorithms is interesting in this context, because they set some kind of benchmarks against which other methods can be compared. This paper proposes an exact synthesis approach based on an iterative deepening version of the A* algorithm using the multiple-control Toffoli gate library. Experimental results are presented with comparisons with other exact and some heuristic based synthesis approaches.

  20. Multi-variants synthesis of Petri nets for FPGA devices

    NASA Astrophysics Data System (ADS)

    Bukowiec, Arkadiusz; Doligalski, Michał

    2015-09-01

    There is presented new method of synthesis of application specific logic controllers for FPGA devices. The specification of control algorithm is made with use of control interpreted Petri net (PT type). It allows specifying parallel processes in easy way. The Petri net is decomposed into state-machine type subnets. In this case, each subnet represents one parallel process. For this purpose there are applied algorithms of coloring of Petri nets. There are presented two approaches of such decomposition: with doublers of macroplaces or with one global wait place. Next, subnets are implemented into two-level logic circuit of the controller. The levels of logic circuit are obtained as a result of its architectural decomposition. The first level combinational circuit is responsible for generation of next places and second level decoder is responsible for generation output symbols. There are worked out two variants of such circuits: with one shared operational memory or with many flexible distributed memories as a decoder. Variants of Petri net decomposition and structures of logic circuits can be combined together without any restrictions. It leads to existence of four variants of multi-variants synthesis.

  1. Parallel Computer System for 3D Visualization Stereo on GPU

    NASA Astrophysics Data System (ADS)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  2. MO-G-17A-02: Computer Simulation Studies for On-Board Functional and Molecular Imaging of the Prostate Using a Robotic Multi-Pinhole SPECT System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, L; Duke University Medical Center, Durham, NC; Fudan University Shanghai Cancer Center, Shanghai

    Purpose: To investigate prostate imaging onboard radiation therapy machines using a novel robotic, 49-pinhole Single Photon Emission Computed Tomography (SPECT) system. Methods: Computer-simulation studies were performed for region-of-interest (ROI) imaging using a 49-pinhole SPECT collimator and for broad cross-section imaging using a parallel-hole SPECT collimator. A male XCAT phantom was computersimulated in supine position with one 12mm-diameter tumor added in the prostate. A treatment couch was added to the phantom. Four-minute detector trajectories for imaging a 7cm-diameter-sphere ROI encompassing the tumor were investigated with different parameters, including pinhole focal length, pinhole diameter and trajectory starting angle. Pseudo-random Poisson noise wasmore » included in the simulated projection data, and SPECT images were reconstructed by OSEM with 4 subsets and up to 10 iterations. Images were evaluated by visual inspection, profiles, and Root-Mean- Square-Error (RMSE). Results: The tumor was well visualized above background by the 49-pinhole SPECT system with different pinhole parameters while it was not visible with parallel-hole SPECT imaging. Minimum RMSEs were 0.30 for 49-pinhole imaging and 0.41 for parallelhole imaging. For parallel-hole imaging, the detector trajectory from rightto- left yielded slightly lower RMSEs than that from posterior to anterior. For 49-pinhole imaging, near-minimum RMSEs were maintained over a broader range of OSEM iterations with a 5mm pinhole diameter and 21cm focal length versus a 2mm diameter pinhole and 18cm focal length. The detector with 21cm pinhole focal length had the shortest rotation radius averaged over the trajectory. Conclusion: On-board functional and molecular prostate imaging may be feasible in 4-minute scan times by robotic SPECT. A 49-pinhole SPECT system could improve such imaging as compared to broadcross-section parallel-hole collimated SPECT imaging. Multi-pinhole imaging can be improved by considering pinhole focal length, pinhole diameter, and trajectory starting angle. The project is supported by the NIH grant 5R21-CA156390.« less

  3. Monte Carlo and Molecular Dynamics in the Multicanonical Ensemble: Connections between Wang-Landau Sampling and Metadynamics

    NASA Astrophysics Data System (ADS)

    Vogel, Thomas; Perez, Danny; Junghans, Christoph

    2014-03-01

    We show direct formal relationships between the Wang-Landau iteration [PRL 86, 2050 (2001)], metadynamics [PNAS 99, 12562 (2002)] and statistical temperature molecular dynamics [PRL 97, 050601 (2006)], the major Monte Carlo and molecular dynamics work horses for sampling from a generalized, multicanonical ensemble. We aim at helping to consolidate the developments in the different areas by indicating how methodological advancements can be transferred in a straightforward way, avoiding the parallel, largely independent, developments tracks observed in the past.

  4. An improved Newton iteration for the generalized inverse of a matrix, with applications

    NASA Technical Reports Server (NTRS)

    Pan, Victor; Schreiber, Robert

    1990-01-01

    The purpose here is to clarify and illustrate the potential for the use of variants of Newton's method of solving problems of practical interest on highly personal computers. The authors show how to accelerate the method substantially and how to modify it successfully to cope with ill-conditioned matrices. The authors conclude that Newton's method can be of value for some interesting computations, especially in parallel and other computing environments in which matrix products are especially easy to work with.

  5. Density-matrix-based algorithm for solving eigenvalue problems

    NASA Astrophysics Data System (ADS)

    Polizzi, Eric

    2009-03-01

    A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.

  6. TMAP: Tübingen NLTE Model-Atmosphere Package

    NASA Astrophysics Data System (ADS)

    Werner, Klaus; Dreizler, Stefan; Rauch, Thomas

    2012-12-01

    The Tübingen NLTE Model-Atmosphere Package (TMAP) is a tool to calculate stellar atmospheres in spherical or plane-parallel geometry in hydrostatic and radiative equilibrium allowing departures from local thermodynamic equilibrium (LTE) for the population of atomic levels. It is based on the Accelerated Lambda Iteration (ALI) method and is able to account for line blanketing by metals. All elements from hydrogen to nickel may be included in the calculation with model atoms which are tailored for the aims of the user.

  7. A Hardware-Accelerated Quantum Monte Carlo framework (HAQMC) for N-body systems

    NASA Astrophysics Data System (ADS)

    Gothandaraman, Akila; Peterson, Gregory D.; Warren, G. Lee; Hinde, Robert J.; Harrison, Robert J.

    2009-12-01

    Interest in the study of structural and energetic properties of highly quantum clusters, such as inert gas clusters has motivated the development of a hardware-accelerated framework for Quantum Monte Carlo simulations. In the Quantum Monte Carlo method, the properties of a system of atoms, such as the ground-state energies, are averaged over a number of iterations. Our framework is aimed at accelerating the computations in each iteration of the QMC application by offloading the calculation of properties, namely energy and trial wave function, onto reconfigurable hardware. This gives a user the capability to run simulations for a large number of iterations, thereby reducing the statistical uncertainty in the properties, and for larger clusters. This framework is designed to run on the Cray XD1 high performance reconfigurable computing platform, which exploits the coarse-grained parallelism of the processor along with the fine-grained parallelism of the reconfigurable computing devices available in the form of field-programmable gate arrays. In this paper, we illustrate the functioning of the framework, which can be used to calculate the energies for a model cluster of helium atoms. In addition, we present the capabilities of the framework that allow the user to vary the chemical identities of the simulated atoms. Program summaryProgram title: Hardware Accelerated Quantum Monte Carlo (HAQMC) Catalogue identifier: AEEP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 691 537 No. of bytes in distributed program, including test data, etc.: 5 031 226 Distribution format: tar.gz Programming language: C/C++ for the QMC application, VHDL and Xilinx 8.1 ISE/EDK tools for FPGA design and development Computer: Cray XD1 consisting of a dual-core, dualprocessor AMD Opteron 2.2 GHz with a Xilinx Virtex-4 (V4LX160) or Xilinx Virtex-II Pro (XC2VP50) FPGA per node. We use the compute node with the Xilinx Virtex-4 FPGA Operating system: Red Hat Enterprise Linux OS Has the code been vectorised or parallelized?: Yes Classification: 6.1 Nature of problem: Quantum Monte Carlo is a practical method to solve the Schrödinger equation for large many-body systems and obtain the ground-state properties of such systems. This method involves the sampling of a number of configurations of atoms and averaging the properties of the configurations over a number of iterations. We are interested in applying the QMC method to obtain the energy and other properties of highly quantum clusters, such as inert gas clusters. Solution method: The proposed framework provides a combined hardware-software approach, in which the QMC simulation is performed on the host processor, with the computationally intensive functions such as energy and trial wave function computations mapped onto the field-programmable gate array (FPGA) logic device attached as a co-processor to the host processor. We perform the QMC simulation for a number of iterations as in the case of our original software QMC approach, to reduce the statistical uncertainty of the results. However, our proposed HAQMC framework accelerates each iteration of the simulation, by significantly reducing the time taken to calculate the ground-state properties of the configurations of atoms, thereby accelerating the overall QMC simulation. We provide a generic interpolation framework that can be extended to study a variety of pure and doped atomic clusters, irrespective of the chemical identities of the atoms. For the FPGA implementation of the properties, we use a two-region approach for accurately computing the properties over the entire domain, employ deep pipelines and fixed-point for all our calculations guaranteeing the accuracy required for our simulation.

  8. GPU-based parallel algorithm for blind image restoration using midfrequency-based methods

    NASA Astrophysics Data System (ADS)

    Xie, Lang; Luo, Yi-han; Bao, Qi-liang

    2013-08-01

    GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.

  9. Stimulation of ATP synthesis in Halobacterium halobium R1 by light-induced or artifically created proton electrochemical potential gradients across the cell membrane.

    PubMed

    Danon, A; Caplan, S R

    1976-01-15

    The relationship between proton movement and phosphorylation in Halo-bacterium halobium R1 has been investigated under anaerobic conditions. The light-induced changes in the bacteriorhodopsin are accompanied by proton movements across the membrane which result in pH changes in the suspending medium. The initial alkaline shift is shown to be closely paralleled by (and hence correlated with) ATP synthesis. Acidification of the medium in the presence of valinomycin, under conditions of low external potassium, brings about ATP synthesis in the dark.

  10. Solution-phase parallel synthesis of hexahydro-1H-isoindolone libraries via tactical combination of Cu-catalyzed three-component coupling and Diels-Alder reactions.

    PubMed

    Zhang, Lei; Lushington, Gerald H; Neuenswander, Benjamin; Hershberger, John C; Malinakova, Helena C

    2008-01-01

    Parallel solution-phase synthesis of combinatorial libraries of hexahydro-1 H-isoindolones exploiting a novel "tactical combination" of Cu-catalyzed three-component coupling and Diels-Alder reactions was accomplished. Three distinct libraries consisting of 24 members (library I), 60 members (library II), and 32 members (library III) were constructed. Variation of three substituents on the isoindolone scaffold in library I was exclusively achieved by the choice of the building blocks. In the syntheses of libraries II and III, sublibraries of isoindolone scaffolds were prepared initially in a one-pot/two-step process and were further diversified via Pd-catalyzed Suzuki cross-coupling reaction with boronic acids at two different diversification points. The Lipinski profiles and calculated ADME properties of the compounds are also reported.

  11. Algorithmic synthesis using Python compiler

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radoslaw; Romaniuk, Ryszard; Pozniak, Krzysztof; Linczuk, Maciej

    2015-09-01

    This paper presents a python to VHDL compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and translate it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the programmed circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. This can be achieved by using many computational resources at the same time. Creating parallel programs implemented in FPGAs in pure HDL is difficult and time consuming. Using higher level of abstraction and High-Level Synthesis compiler implementation time can be reduced. The compiler has been implemented using the Python language. This article describes design, implementation and results of created tools.

  12. Quantum Approaches to Logic Circuit Synthesis and Testing

    DTIC Science & Technology

    2006-06-01

    with n qubits using Octave (Oct), MATLAB (MAT), Blitz++ (B++) and QuIDDPro (QP) with Oracle Design 1. 42 Table 4: Simulating Grover’s...algorithm with n qubits using Octave (Oct), MATLAB (MAT), Blitz++ (B++) and QuIDDPro (QP) with Oracle Design 2. 43 Table 5: Number of Grover iterations...to accurately characterize the effects of gate and systematic error in a quantum circuit that generates remotely entangled EPR pairs. An

  13. Synthesis of a Comprehensive Polyprenol Library for Evaluation of Bacterial Enzyme Lipid Substrate Specificity.

    PubMed

    Wu, Baolin; Woodward, Robert; Wen, Liuqing; Wang, Xuan; Zhao, Guohui; Wang, Peng George

    2013-12-01

    Polyprenols, a type of universal glycan lipid carrier, play important roles for glycan bio-assembly in wide variety of living systems. Chemical synthesis of natural polyisoprenols such as undecaprenol and dolichols, but especially their homologs, could serves as a powerful molecular tool to dissect and define the functions of enzymes involved in glycan biosynthesis. In this paper, we report an efficient and reliable method to construct this type of hydrophoic molecule through a base-mediated iterative coupling approach using a key bifunctional ( Z , Z )-diisoprenyl building block. The ligation with N -acetyl-D-glactosamine (GalNAc) with a set of the synthesized lipid analogs forming polyprenol pyrophosphate linked GalNAc (GalNAc-PP-lipid) conjugates is also demonstrated.

  14. DGDFT: A massively parallel method for large scale density functional theory calculations.

    PubMed

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  15. An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

    PubMed

    Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

    2011-01-01

    Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.

  16. Legacy Code Modernization

    NASA Technical Reports Server (NTRS)

    Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.

  17. Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widlund, Olof B.

    2015-06-09

    The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less

  18. Accelerating electron tomography reconstruction algorithm ICON with GPU.

    PubMed

    Chen, Yu; Wang, Zihao; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Sun, Fei; Zhang, Fa

    2017-01-01

    Electron tomography (ET) plays an important role in studying in situ cell ultrastructure in three-dimensional space. Due to limited tilt angles, ET reconstruction always suffers from the "missing wedge" problem. With a validation procedure, iterative compressed-sensing optimized NUFFT reconstruction (ICON) demonstrates its power in the restoration of validated missing information for low SNR biological ET dataset. However, the huge computational demand has become a major problem for the application of ICON. In this work, we analyzed the framework of ICON and classified the operations of major steps of ICON reconstruction into three types. Accordingly, we designed parallel strategies and implemented them on graphics processing units (GPU) to generate a parallel program ICON-GPU. With high accuracy, ICON-GPU has a great acceleration compared to its CPU version, up to 83.7×, greatly relieving ICON's dependence on computing resource.

  19. A third-generation density-functional-theory-based method for calculating canonical molecular orbitals of large molecules.

    PubMed

    Hirano, Toshiyuki; Sato, Fumitoshi

    2014-07-28

    We used grid-free modified Cholesky decomposition (CD) to develop a density-functional-theory (DFT)-based method for calculating the canonical molecular orbitals (CMOs) of large molecules. Our method can be used to calculate standard CMOs, analytically compute exchange-correlation terms, and maximise the capacity of next-generation supercomputers. Cholesky vectors were first analytically downscaled using low-rank pivoted CD and CD with adaptive metric (CDAM). The obtained Cholesky vectors were distributed and stored on each computer node in a parallel computer, and the Coulomb, Fock exchange, and pure exchange-correlation terms were calculated by multiplying the Cholesky vectors without evaluating molecular integrals in self-consistent field iterations. Our method enables DFT and massively distributed memory parallel computers to be used in order to very efficiently calculate the CMOs of large molecules.

  20. Chevron beam dump for ITER edge Thomson scattering system.

    PubMed

    Yatsuka, E; Hatae, T; Vayakis, G; Bassan, M; Itami, K

    2013-10-01

    This paper contains the design of the beam dump for the ITER edge Thomson scattering system and mainly concerns its lifetime under the harsh thermal and electromagnetic loads as well as tight space allocation. The lifetime was estimated from the multi-pulse laser-induced damage threshold. In order to extend its lifetime, the structure of the beam dump was optimized. A number of bent sheets aligned parallel in the beam dump form a shape called a chevron which enables it to avoid the concentration of the incident laser pulse energy. The chevron beam dump is expected to withstand thermal loads due to nuclear heating, radiation from the plasma, and numerous incident laser pulses throughout the entire ITER project with a reasonable margin for the peak factor of the beam profile. Structural analysis was also carried out in case of electromagnetic loads during a disruption. Moreover, detailed issues for more accurate assessments of the beam dump's lifetime are clarified. Variation of the bi-directional reflection distribution function (BRDF) due to erosion by or contamination of neutral particles derived from the plasma is one of the most critical issues that needs to be resolved. In this paper, the BRDF was assumed, and the total amount of stray light and the absorbed laser energy profile on the beam dump were evaluated.

  1. Parallel synthesis of a series of potentially brain penetrant aminoalkyl benzoimidazoles.

    PubMed

    Micco, Iolanda; Nencini, Arianna; Quinn, Joanna; Bothmann, Hendrick; Ghiron, Chiara; Padova, Alessandro; Papini, Silvia

    2008-03-01

    Alpha7 agonists were identified via GOLD (CCDC) docking in the putative agonist binding site of an alpha7 homology model and a series of aminoalkyl benzoimidazoles was synthesised to obtain potentially brain penetrant drugs. The array was prepared starting from the reaction of ortho-fluoronitrobenzenes with a selection of diamines, followed by reduction of the nitro group to obtain a series of monoalkylated phenylene diamines. N,N'-Carbonyldiimidazole (CDI) mediated acylation, followed by a parallel automated work-up procedure, afforded the monoacylated phenylenediamines which were cyclised under acidic conditions. Parallel work-up and purification afforded the array products in good yields and purities with a robust parallel methodology which will be useful for other libraries. Screening for alpha7 activity revealed compounds with agonist activity for the receptor.

  2. A parallel algorithm for 2D visco-acoustic frequency-domain full-waveform inversion: application to a dense OBS data set

    NASA Astrophysics Data System (ADS)

    Sourbier, F.; Operto, S.; Virieux, J.

    2006-12-01

    We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.

  3. Current status and recent research achievements in ferritic/martensitic steels

    NASA Astrophysics Data System (ADS)

    Tavassoli, A.-A. F.; Diegele, E.; Lindau, R.; Luzginova, N.; Tanigawa, H.

    2014-12-01

    When the austenitic stainless steel 316L(N) was selected for ITER, it was well known that it would not be suitable for DEMO and fusion reactors due to its irradiation swelling at high doses. A parallel programme to ITER collaboration already had been put in place, under an IEA fusion materials implementing agreement for the development of a low activation ferritic/martensitic steel, known for their excellent high dose irradiation swelling resistance. After extensive screening tests on different compositions of Fe-Cr alloys, the chromium range was narrowed to 7-9% and the first RAFM was industrially produced in Japan (F82H: Fe-8%Cr-2%W-TaV). All IEA partners tested this steel and contributed to its maturity. In parallel several other RAFM steels were produced in other countries. From those experiences and also for improving neutron efficiency and corrosion resistance, European Union opted for a higher chromium lower tungsten grade, Fe-9%Cr-1%W-TaV steel (Eurofer), and in 1997 ordered the first industrial heats. Other industrial heats have been produced since and characterised in different states, including irradiated up to 80 dpa. China, India, Russia, Korea and US have also produced their grades of RAFM steels, contributing to overall maturity of these steels. This paper reviews the work done on RAFM steels by the fusion materials community over the past 30 years, in particular on the Eurofer steel and its design code qualification for RCC-MRx.

  4. Tomographic image reconstruction using the cell broadband engine (CBE) general purpose hardware

    NASA Astrophysics Data System (ADS)

    Knaup, Michael; Steckmann, Sven; Bockenbach, Olivier; Kachelrieß, Marc

    2007-02-01

    Tomographic image reconstruction, such as the reconstruction of CT projection values, of tomosynthesis data, PET or SPECT events, is computational very demanding. In filtered backprojection as well as in iterative reconstruction schemes, the most time-consuming steps are forward- and backprojection which are often limited by the memory bandwidth. Recently, a novel general purpose architecture optimized for distributed computing became available: the Cell Broadband Engine (CBE). Its eight synergistic processing elements (SPEs) currently allow for a theoretical performance of 192 GFlops (3 GHz, 8 units, 4 floats per vector, 2 instructions, multiply and add, per clock). To maximize image reconstruction speed we modified our parallel-beam and perspective backprojection algorithms which are highly optimized for standard PCs, and optimized the code for the CBE processor. 1-3 In addition, we implemented an optimized perspective forwardprojection on the CBE which allows us to perform statistical image reconstructions like the ordered subset convex (OSC) algorithm. 4 Performance was measured using simulated data with 512 projections per rotation and 5122 detector elements. The data were backprojected into an image of 512 3 voxels using our PC-based approaches and the new CBE- based algorithms. Both the PC and the CBE timings were scaled to a 3 GHz clock frequency. On the CBE, we obtain total reconstruction times of 4.04 s for the parallel backprojection, 13.6 s for the perspective backprojection and 192 s for a complete OSC reconstruction, consisting of one initial Feldkamp reconstruction, followed by 4 OSC iterations.

  5. Lesson from Tungsten Leading Edge Heat Load Analysis in KSTAR Divertor

    NASA Astrophysics Data System (ADS)

    Hong, Suk-Ho; Pitts, Richard Anthony; Lee, Hyeong-Ho; Bang, Eunnam; Kang, Chan-Soo; Kim, Kyung-Min; Kim, Hong-Tack; ITER Organization Collaboration; Kstar Team Team

    2016-10-01

    An important design issue for the ITER tungsten (W) divertor and in fact for all such components using metallic plasma-facing elements and which are exposed to high parallel power fluxes, is the question of surface shaping to avoid melting of leading edges. We have fabricated a series of tungsten blocks with a variety of leading edge heights (0.3, 0.6, 1.0, and 2.0 mm), from the ITER worst case to heights even beyond the extreme value tested on JET. They are mounted into adjacent, inertially cooled graphite tile installed in the central divertor region of KSTAR, within the field of view of an infra-red (IR) thermography system with a spatial resolution to 0.4 mm/pixel. Adjustment of the outer divertor strike point position is used to deposit power on the different blocks in different discharges. The measured power flux density on flat regions of the surrounding graphite tiles is used to obtain the parallel power flux, q|| impinging on the various W blocks. Experiments have been performed in Type I ELMing H-mode with Ip = 600 kA, BT = 2 T, PNBI = 3.5 MW, leading to a hot attached divertor with typical pulse lengths of 10 s. Three dimensional ANSYS simulations using q|| and assuming geometric projection of the heat flux are found to be consistent with the observed edge loading. This research was partially supported by Ministry of Science, ICT, and Future Planning under KSTAR project.

  6. SQDFT: Spectral Quadrature method for large-scale parallel O(N) Kohn-Sham calculations at high temperature

    NASA Astrophysics Data System (ADS)

    Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.

    2018-03-01

    We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.

  7. Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

    NASA Astrophysics Data System (ADS)

    Xing, F.; Masson, R.; Lopez, S.

    2017-09-01

    This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.

  8. An L1-norm phase constraint for half-Fourier compressed sensing in 3D MR imaging.

    PubMed

    Li, Guobin; Hennig, Jürgen; Raithel, Esther; Büchert, Martin; Paul, Dominik; Korvink, Jan G; Zaitsev, Maxim

    2015-10-01

    In most half-Fourier imaging methods, explicit phase replacement is used. In combination with parallel imaging, or compressed sensing, half-Fourier reconstruction is usually performed in a separate step. The purpose of this paper is to report that integration of half-Fourier reconstruction into iterative reconstruction minimizes reconstruction errors. The L1-norm phase constraint for half-Fourier imaging proposed in this work is compared with the L2-norm variant of the same algorithm, with several typical half-Fourier reconstruction methods. Half-Fourier imaging with the proposed phase constraint can be seamlessly combined with parallel imaging and compressed sensing to achieve high acceleration factors. In simulations and in in-vivo experiments half-Fourier imaging with the proposed L1-norm phase constraint enables superior performance both reconstruction of image details and with regard to robustness against phase estimation errors. The performance and feasibility of half-Fourier imaging with the proposed L1-norm phase constraint is reported. Its seamless combination with parallel imaging and compressed sensing enables use of greater acceleration in 3D MR imaging.

  9. Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

    NASA Technical Reports Server (NTRS)

    Datta, Anubhav; Johnson, Wayne R.

    2009-01-01

    This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.

  10. On the Development of an Efficient Parallel Hybrid Solver with Application to Acoustically Treated Aero-Engine Nacelles

    NASA Technical Reports Server (NTRS)

    Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj

    2006-01-01

    A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.

  11. Robust non-fragile finite-frequency H∞ static output-feedback control for active suspension systems

    NASA Astrophysics Data System (ADS)

    Wang, Gang; Chen, Changzheng; Yu, Shenbo

    2017-07-01

    This paper deals with the problem of non-fragile H∞ static output-feedback control of vehicle active suspension systems with finite-frequency constraint. The control objective is to improve ride comfort within the given frequency range and ensure the hard constraints in the time-domain. Moreover, in order to enhance the robustness of the controller, the control gain perturbation is also considered in controller synthesis. Firstly, a new non-fragile H∞ finite-frequency control condition is established by using generalized Kalman-Yakubovich-Popov (GKYP) lemma. Secondly, the static output-feedback control gain is directly derived by using a non-iteration algorithm. Different from the existing iteration LMI results, the static output-feedback design is simple and less conservative. Finally, the proposed control algorithm is applied to a quarter-car active suspension model with actuator dynamics, numerical results are made to show the effectiveness and merits of the proposed method.

  12. Modern Workflow Full Waveform Inversion Applied to North America and the Northern Atlantic

    NASA Astrophysics Data System (ADS)

    Krischer, Lion; Fichtner, Andreas; Igel, Heiner

    2015-04-01

    We present the current state of a new seismic tomography model obtained using full waveform inversion of the crustal and upper mantle structure beneath North America and the Northern Atlantic, including the westernmost part of Europe. Parts of the eastern portion of the initial model consists of previous models by Fichtner et al. (2013) and Rickers et al. (2013). The final results of this study will contribute to the 'Comprehensive Earth Model' being developed by the Computational Seismology group at ETH Zurich. Significant challenges include the size of the domain, the uneven event and station coverage, and the strong east-west alignment of seismic ray paths across the North Atlantic. We use as much data as feasible, resulting in several thousand recordings per event depending on the receivers deployed at the earthquakes' origin times. To manage such projects in a reproducible and collaborative manner, we, as tomographers, should abandon ad-hoc scripts and one-time programs, and adopt sustainable and reusable solutions. Therefore we developed the LArge-scale Seismic Inversion Framework (LASIF - http://lasif.net), an open-source toolbox for managing seismic data in the context of non-linear iterative inversions that greatly reduces the time to research. Information on the applied processing, modelling, iterative model updating, what happened during each iteration, and so on are systematically archived. This results in a provenance record of the final model which in the end significantly enhances the reproducibility of iterative inversions. Additionally, tools for automated data download across different data centers, window selection, misfit measurements, parallel data processing, and input file generation for various forward solvers are provided.

  13. Optimizing convergence rates of alternating minimization reconstruction algorithms for real-time explosive detection applications

    NASA Astrophysics Data System (ADS)

    Bosch, Carl; Degirmenci, Soysal; Barlow, Jason; Mesika, Assaf; Politte, David G.; O'Sullivan, Joseph A.

    2016-05-01

    X-ray computed tomography reconstruction for medical, security and industrial applications has evolved through 40 years of experience with rotating gantry scanners using analytic reconstruction techniques such as filtered back projection (FBP). In parallel, research into statistical iterative reconstruction algorithms has evolved to apply to sparse view scanners in nuclear medicine, low data rate scanners in Positron Emission Tomography (PET) [5, 7, 10] and more recently to reduce exposure to ionizing radiation in conventional X-ray CT scanners. Multiple approaches to statistical iterative reconstruction have been developed based primarily on variations of expectation maximization (EM) algorithms. The primary benefit of EM algorithms is the guarantee of convergence that is maintained when iterative corrections are made within the limits of convergent algorithms. The primary disadvantage, however is that strict adherence to correction limits of convergent algorithms extends the number of iterations and ultimate timeline to complete a 3D volumetric reconstruction. Researchers have studied methods to accelerate convergence through more aggressive corrections [1], ordered subsets [1, 3, 4, 9] and spatially variant image updates. In this paper we describe the development of an AM reconstruction algorithm with accelerated convergence for use in a real-time explosive detection application for aviation security. By judiciously applying multiple acceleration techniques and advanced GPU processing architectures, we are able to perform 3D reconstruction of scanned passenger baggage at a rate of 75 slices per second. Analysis of the results on stream of commerce passenger bags demonstrates accelerated convergence by factors of 8 to 15, when comparing images from accelerated and strictly convergent algorithms.

  14. The SWATH Concept: Designing Superior Operability into a Surface Displacement Ship

    DTIC Science & Technology

    1975-12-01

    ACKNOWLEDGMENTS 140 REFERENCES 141 —^f* • ■ " mil,. .„.. • LIST OF FIGURES Page 1 — Artist’s Concept of a 4000-Ton ", WATH Combatant 4 2 ~ The...the design process for SWATH combatants is iterative. At either the feasibility or conceptual stage, the designer starts with a "reasonable" hull...parameters, the multitude of design factors and innumerable combinations thereof constitute a difficult synthesis problem. Because they are

  15. Research, Development and Testing of a Fault-Tolerant FPGA-Based Sequencer for CubeSat Launching Applications

    DTIC Science & Technology

    2013-03-01

    amounts of time and effort to implement. Future testing with commercial, fault-tolerant synthesis software, under a radiation environment, will yield ...initial viewpoint of the author is to take the flash-based FPGA route. This will yield a simple, reconfigurable circuit while providing the added...structure seen in Figure 30. Each of these full adder blocks were replaced in subsequent iterations to yield proper comparison with this baseline

  16. High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.

    PubMed

    Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D

    2008-08-01

    The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.

  17. Current status and future prospects for enabling chemistry technology in the drug discovery process.

    PubMed

    Djuric, Stevan W; Hutchins, Charles W; Talaty, Nari N

    2016-01-01

    This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of "dangerous" reagents. Also featured are advances in the "computer-assisted drug design" area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities.

  18. An accurate, fast, and scalable solver for high-frequency wave propagation

    NASA Astrophysics Data System (ADS)

    Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.

    2017-12-01

    In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.

  19. Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields.

    PubMed

    Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip

    2018-01-28

    We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed.

  20. Formation of organic layer on femtosecond laser-induced periodic surface structures

    NASA Astrophysics Data System (ADS)

    Yasumaru, Naoki; Sentoku, Eisuke; Kiuchi, Junsuke

    2017-05-01

    Two types of laser-induced periodic surface structures (LIPSS) formed on titanium by femtosecond (fs) laser pulses (λ = 800 nm, τ = 180 fs, ν = 1 kHz) in air were investigated experimentally. At a laser fluence F above the ablation threshold, LIPSS with a minimum mean spacing of D < λ⁄2 were observed perpendicular to the laser polarization direction. In contrast, for F slightly below than the ablation threshold, ultrafine LIPSS with a minimum value of D < λ/10 were formed parallel to the polarization direction. The surface roughness of the parallel-oriented LIPSS was almost the same as that of the non-irradiated surface, unlike the high roughness of the perpendicular-oriented LIPSS. In addition, although the surface state of the parallel-oriented LIPSS was the same as that of the non-irradiated surface, the perpendicular-oriented LIPSS were covered with an organic thin film similar to a cellulose derivative that cannot be easily formed by conventional chemical synthesis. The results of these surface analyses indicate that these two types of LIPSS are formed through different mechanisms. This fs-laser processing technique may become a new technology for the artificial synthesis of cellulose derivatives.

  1. A parallel computing engine for a class of time critical processes.

    PubMed

    Nabhan, T M; Zomaya, A Y

    1997-01-01

    This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.

  2. StrAuto: automation and parallelization of STRUCTURE analysis.

    PubMed

    Chhatre, Vikram E; Emerson, Kevin J

    2017-03-24

    Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .

  3. Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

    NASA Astrophysics Data System (ADS)

    Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

    2017-11-01

    The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).

  4. Chrestenson transform FPGA embedded factorizations.

    PubMed

    Corinthios, Michael J

    2016-01-01

    Chrestenson generalized Walsh transform factorizations for parallel processing imbedded implementations on field programmable gate arrays are presented. This general base transform, sometimes referred to as the Discrete Chrestenson transform, has received special attention in recent years. In fact, the Discrete Fourier transform and Walsh-Hadamard transform are but special cases of the Chrestenson generalized Walsh transform. Rotations of a base-p hypercube, where p is an arbitrary integer, are shown to produce dynamic contention-free memory allocation, in processor architecture. The approach is illustrated by factorizations involving the processing of matrices of the transform which are function of four variables. Parallel operations are implemented matrix multiplications. Each matrix, of dimension N × N, where N = p (n) , n integer, has a structure that depends on a variable parameter k that denotes the iteration number in the factorization process. The level of parallelism, in the form of M = p (m) processors can be chosen arbitrarily by varying m between zero to its maximum value of n - 1. The result is an equation describing the generalised parallelism factorization as a function of the four variables n, p, k and m. Applications of the approach are shown in relation to configuring field programmable gate arrays for digital signal processing applications.

  5. Parallel synthesis of libraries of anodic and cathodic functionalized electrodeposition paints as immobilization matrix for amperometric biosensors.

    PubMed

    Ngounou, Bertrand; Aliyev, Elchin H; Guschin, Dmitrii A; Sultanov, Yusif M; Efendiev, Ayaz A; Schuhmann, Wolfgang

    2007-09-01

    The integration of flexible anchoring groups bearing imidazolyl or pyridyl substituents into the structure of electrodeposition paints (EDP) is the basis for the parallel synthesis of a library containing 107 members of different cathodic and anodic EDPs with a high variation in polymer properties. The obtained EDPs were used as immobilization matrix for biosensor fabrication using glucose oxidase as a model enzyme. Amperometric glucose sensors based on the different EDPs showed a wide variation in their sensor characteristics with respect to the apparent Michaelis-Menten constant (KM(app)) representing the linear measuring range and the maximum current (Imax(app)). Based on these results first assumptions concerning the impact of different side chains in the EDP on the expected biosensor properties could be obtained allowing for an improved rational optimization of EDPs used as immobilization matrix in amperometric biosensors.

  6. Synthesis of most polyene natural product motifs using just twelve building blocks and one coupling reaction

    PubMed Central

    Woerly, Eric M.; Roy, Jahnabi; Burke, Martin D.

    2014-01-01

    The inherent modularity of polypeptides, oligonucleotides, and oligosaccharides has been harnessed to achieve generalized building block-based synthesis platforms. Importantly, like these other targets, most small molecule natural products are biosynthesized via iterative coupling of bifunctional building blocks. This suggests that many small molecules also possess inherent modularity commensurate with systematic building block-based construction. Supporting this hypothesis, here we report that the polyene motifs found in >75% of all known polyene natural products can be synthesized using just 12 building blocks and one coupling reaction. Using the same general retrosynthetic algorithm and reaction conditions, this platform enabled the synthesis of a wide range of polyene frameworks covering all of this natural product chemical space, and first total syntheses of the polyene natural products asnipyrone B, physarigin A, and neurosporaxanthin β-D-glucopyranoside. Collectively, these results suggest the potential for a more generalized approach for making small molecules in the laboratory. PMID:24848233

  7. Robust Frequency-Domain Constrained Feedback Design via a Two-Stage Heuristic Approach.

    PubMed

    Li, Xianwei; Gao, Huijun

    2015-10-01

    Based on a two-stage heuristic method, this paper is concerned with the design of robust feedback controllers with restricted frequency-domain specifications (RFDSs) for uncertain linear discrete-time systems. Polytopic uncertainties are assumed to enter all the system matrices, while RFDSs are motivated by the fact that practical design specifications are often described in restricted finite frequency ranges. Dilated multipliers are first introduced to relax the generalized Kalman-Yakubovich-Popov lemma for output feedback controller synthesis and robust performance analysis. Then a two-stage approach to output feedback controller synthesis is proposed: at the first stage, a robust full-information (FI) controller is designed, which is used to construct a required output feedback controller at the second stage. To improve the solvability of the synthesis method, heuristic iterative algorithms are further formulated for exploring the feedback gain and optimizing the initial FI controller at the individual stage. The effectiveness of the proposed design method is finally demonstrated by the application to active control of suspension systems.

  8. Synthesis of hydroxyphthioceranic acid using a traceless lithiation-borylation-protodeboronation strategy

    NASA Astrophysics Data System (ADS)

    Rasappan, Ramesh; Aggarwal, Varinder K.

    2014-09-01

    In planning organic syntheses, disconnections are most often made adjacent to functional groups, which assist in C-C bond formation. For molecules devoid of obvious functional groups this approach presents a problem, and so functionalities must be installed temporarily and then removed. Here we present a traceless strategy for organic synthesis that uses a boronic ester as such a group in a one-pot lithiation-borylation-protodeboronation sequence. To realize this strategy, we developed a methodology for the protodeboronation of alkyl pinacol boronic esters that involves the formation of a boronate complex with a nucleophile followed by oxidation with Mn(OAc)3 in the presence of the hydrogen-atom donor 4-tert-butylcatechol. Iterative lithiation-borylation-protodeboronation allows the coupling of smaller fragments to build-up long alkyl chains. We employed this strategy in the synthesis of hydroxyphthioceranic acid, a key component of the cell-wall lipid of the virulent Mycobacterium tuberculosis, in just 14 steps (longest linear sequence) with full stereocontrol.

  9. Method and apparatus for combinatorial chemistry

    DOEpatents

    Foote, Robert S.

    2007-02-20

    A method and apparatus are provided for performing light-directed reactions in spatially addressable channels within a plurality of channels. One aspect of the invention employs photoactivatable reagents in solutions disposed into spatially addressable flow streams to control the parallel synthesis of molecules immobilized within the channels. The reagents may be photoactivated within a subset of channels at the site of immobilized substrate molecules or at a light-addressable site upstream from the substrate molecules. The method and apparatus of the invention find particularly utility in the synthesis of biopolymer arrays, e.g., oligonucleotides, peptides and carbohydrates, and in the combinatorial synthesis of small molecule arrays for drug discovery.

  10. Method and apparatus for combinatorial chemistry

    DOEpatents

    Foote, Robert S [Oak Ridge, TN

    2012-06-05

    A method and apparatus are provided for performing light-directed reactions in spatially addressable channels within a plurality of channels. One aspect of the invention employs photoactivatable reagents in solutions disposed into spatially addressable flow streams to control the parallel synthesis of molecules immobilized within the channels. The reagents may be photoactivated within a subset of channels at the site of immobilized substrate molecules or at a light-addressable site upstream from the substrate molecules. The method and apparatus of the invention find particularly utility in the synthesis of biopolymer arrays, e.g., oligonucleotides, peptides and carbohydrates, and in the combinatorial synthesis of small molecule arrays for drug discovery.

  11. Application of advanced multidisciplinary analysis and optimization methods to vehicle design synthesis

    NASA Technical Reports Server (NTRS)

    Consoli, Robert David; Sobieszczanski-Sobieski, Jaroslaw

    1990-01-01

    Advanced multidisciplinary analysis and optimization methods, namely system sensitivity analysis and non-hierarchical system decomposition, are applied to reduce the cost and improve the visibility of an automated vehicle design synthesis process. This process is inherently complex due to the large number of functional disciplines and associated interdisciplinary couplings. Recent developments in system sensitivity analysis as applied to complex non-hierarchic multidisciplinary design optimization problems enable the decomposition of these complex interactions into sub-processes that can be evaluated in parallel. The application of these techniques results in significant cost, accuracy, and visibility benefits for the entire design synthesis process.

  12. A Parallel Genetic Algorithm for Automated Electronic Circuit Design

    NASA Technical Reports Server (NTRS)

    Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

    2000-01-01

    Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency issues in the GA, it is possible to have idle processors. However, as long as the load at each processing node is similar, the processors are kept busy nearly all of the time. In applying GAs to circuit design, a suitable genetic representation 'is that of a circuit-construction program. We discuss one such circuit-construction programming language and show how evolution can generate useful analog circuit designs. This language has the desirable property that virtually all sets of combinations of primitives result in valid circuit graphs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. Using a parallel genetic algorithm and circuit simulation software, we present experimental results as applied to three analog filter and two amplifier design tasks. For example, a figure shows an 85 dB amplifier design evolved by our system, and another figure shows the performance of that circuit (gain and frequency response). In all tasks, our system is able to generate circuits that achieve the target specifications.

  13. Construction, classification and parametrization of complex Hadamard matrices

    NASA Astrophysics Data System (ADS)

    Szöllősi, Ferenc

    To improve the design of nuclear systems, high-fidelity neutron fluxes are required. Leadership-class machines provide platforms on which very large problems can be solved. Computing such fluxes efficiently requires numerical methods with good convergence properties and algorithms that can scale to hundreds of thousands of cores. Many 3-D deterministic transport codes are decomposable in space and angle only, limiting them to tens of thousands of cores. Most codes rely on methods such as Gauss Seidel for fixed source problems and power iteration for eigenvalue problems, which can be slow to converge for challenging problems like those with highly scattering materials or high dominance ratios. Three methods have been added to the 3-D SN transport code Denovo that are designed to improve convergence and enable the full use of cutting-edge computers. The first is a multigroup Krylov solver that converges more quickly than Gauss Seidel and parallelizes the code in energy such that Denovo can use hundreds of thousand of cores effectively. The second is Rayleigh quotient iteration (RQI), an old method applied in a new context. This eigenvalue solver finds the dominant eigenvalue in a mathematically optimal way and should converge in fewer iterations than power iteration. RQI creates energy-block-dense equations that the new Krylov solver treats efficiently. However, RQI can have convergence problems because it creates poorly conditioned systems. This can be overcome with preconditioning. The third method is a multigrid-in-energy preconditioner. The preconditioner takes advantage of the new energy decomposition because the grids are in energy rather than space or angle. The preconditioner greatly reduces iteration count for many problem types and scales well in energy. It also allows RQI to be successful for problems it could not solve otherwise. The methods added to Denovo accomplish the goals of this work. They converge in fewer iterations than traditional methods and enable the use of hundreds of thousands of cores. Each method can be used individually, with the multigroup Krylov solver and multigrid-in-energy preconditioner being particularly successful on their own. The largest benefit, though, comes from using these methods in concert.

  14. The application of the large particles method of numerical modeling of the process of carbonic nanostructures synthesis in plasma

    NASA Astrophysics Data System (ADS)

    Abramov, G. V.; Gavrilov, A. N.

    2018-03-01

    The article deals with the numerical solution of the mathematical model of the particles motion and interaction in multicomponent plasma by the example of electric arc synthesis of carbon nanostructures. The high order of the particles and the number of their interactions requires a significant input of machine resources and time for calculations. Application of the large particles method makes it possible to reduce the amount of computation and the requirements for hardware resources without affecting the accuracy of numerical calculations. The use of technology of GPGPU parallel computing using the Nvidia CUDA technology allows organizing all General purpose computation on the basis of the graphical processor graphics card. The comparative analysis of different approaches to parallelization of computations to speed up calculations with the choice of the algorithm in which to calculate the accuracy of the solution shared memory is used. Numerical study of the influence of particles density in the macro particle on the motion parameters and the total number of particle collisions in the plasma for different modes of synthesis has been carried out. The rational range of the coherence coefficient of particle in the macro particle is computed.

  15. Design of a broadband active silencer using μ-synthesis

    NASA Astrophysics Data System (ADS)

    Bai, Mingsian R.; Zeung, Pingshun

    2004-01-01

    A robust spatially feedforward controller is developed for broadband attenuation of noise in ducts. To meet the requirements of robust performance and robust stability in the presence of plant uncertainties, a μ-synthesis procedure via D- K iteration is exploited to obtain the optimal controller. This approach considers uncertainties as modelling errors of the nominal plant in high frequency and is implemented using a floating point digital signal processor (DSP). Experimental investigation was undertaken on a finite-length duct to justify the proposed controller. The μ- controller is compared to other control algorithms such as the H2 method, the H∞ method and the filtered-U least mean square (FULMS) algorithm. Experimental results indicate that the proposed system has attained 25.8 dB maximal attenuation in the band 250-650 Hz.

  16. Multi-component Wronskian solution to the Kadomtsev-Petviashvili equation

    NASA Astrophysics Data System (ADS)

    Xu, Tao; Sun, Fu-Wei; Zhang, Yi; Li, Juan

    2014-01-01

    It is known that the Kadomtsev-Petviashvili (KP) equation can be decomposed into the first two members of the coupled Ablowitz-Kaup-Newell-Segur (AKNS) hierarchy by the binary non-linearization of Lax pairs. In this paper, we construct the N-th iterated Darboux transformation (DT) for the second- and third-order m-coupled AKNS systems. By using together the N-th iterated DT and Cramer's rule, we find that the KPII equation has the unreduced multi-component Wronskian solution and the KPI equation admits a reduced multi-component Wronskian solution. In particular, based on the unreduced and reduced two-component Wronskians, we obtain two families of fully-resonant line-soliton solutions which contain arbitrary numbers of asymptotic solitons as y → ∓∞ to the KPII equation, and the ordinary N-soliton solution to the KPI equation. In addition, we find that the KPI line solitons propagating in parallel can exhibit the bound state at the moment of collision.

  17. Dakota, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Adams, Brian M.; Ebeida, Mohamed Salah; Eldred, Michael S.

    The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components requiredmore » for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.« less

  18. Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blanford, M.

    1997-12-31

    Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less

  19. Resistive MHD Stability Analysis in Near Real-time

    NASA Astrophysics Data System (ADS)

    Glasser, Alexander; Kolemen, Egemen

    2017-10-01

    We discuss the feasibility of a near real-time calculation of the tokamak Δ' matrix, which summarizes MHD stability to resistive modes, such as tearing and interchange modes. As the operational phase of ITER approaches, solutions for active feedback tokamak stability control are needed. It has been previously demonstrated that an ideal MHD stability analysis is achievable on a sub- O (1 s) timescale, as is required to control phenomena comparable with the MHD-evolution timescale of ITER. In the present work, we broaden this result to incorporate the effects of resistive MHD modes. Such modes satisfy ideal MHD equations in regions outside narrow resistive layers that form at singular surfaces. We demonstrate that the use of asymptotic expansions at the singular surfaces, as well as the application of state transition matrices, enable a fast, parallelized solution to the singular outer layer boundary value problem, and thereby rapidly compute Δ'. Sponsored by US DOE under DE-SC0015878 and DE-FC02-04ER54698.

  20. GPU computing with Kaczmarz’s and other iterative algorithms for linear systems

    PubMed Central

    Elble, Joseph M.; Sahinidis, Nikolaos V.; Vouzis, Panagiotis

    2009-01-01

    The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz’s, Cimmino’s, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method. PMID:20526446

Top