DOE Office of Scientific and Technical Information (OSTI.GOV)
2004-04-01
Meros uses the compositional, aggregation, and overload operator capabilities of TSF to provide an object-oriented package providing segregated/block preconditioners for linear systems related to fully-coupled Navier-Stokes problems. This class of preconditioners exploits the special properties of these problems to segregate the equations and use multi-level preconditioners (through ML) on the matrix sub-blocks. Several preconditioners are provided, including the Fp and BFB preconditioners of Kay & Loghin and Silvester, Elman, Kay & Wathen. The overall performance and scalability of these preconditioners approaches that of multigrid for certain types of problems. Meros also provides more traditional pressure projection methods including SIMPLE andmore » SIMPLEC.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadid, John Nicolas; Elman, Howard; Shuttleworth, Robert R.
2007-04-01
In recent years, considerable effort has been placed on developing efficient and robust solution algorithms for the incompressible Navier-Stokes equations based on preconditioned Krylov methods. These include physics-based methods, such as SIMPLE, and purely algebraic preconditioners based on the approximation of the Schur complement. All these techniques can be represented as approximate block factorization (ABF) type preconditioners. The goal is to decompose the application of the preconditioner into simplified sub-systems in which scalable multi-level type solvers can be applied. In this paper we develop a taxonomy of these ideas based on an adaptation of a generalized approximate factorization of themore » Navier-Stokes system first presented in [25]. This taxonomy illuminates the similarities and differences among these preconditioners and the central role played by efficient approximation of certain Schur complement operators. We then present a parallel computational study that examines the performance of these methods and compares them to an additive Schwarz domain decomposition (DD) algorithm. Results are presented for two and three-dimensional steady state problems for enclosed domains and inflow/outflow systems on both structured and unstructured meshes. The numerical experiments are performed using MPSalsa, a stabilized finite element code.« less
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
NASA Astrophysics Data System (ADS)
Badia, Santiago; Martín, Alberto F.; Planas, Ramon
2014-10-01
The thermally coupled incompressible inductionless magnetohydrodynamics (MHD) problem models the flow of an electrically charged fluid under the influence of an external electromagnetic field with thermal coupling. This system of partial differential equations is strongly coupled and highly nonlinear for real cases of interest. Therefore, fully implicit time integration schemes are very desirable in order to capture the different physical scales of the problem at hand. However, solving the multiphysics linear systems of equations resulting from such algorithms is a very challenging task which requires efficient and scalable preconditioners. In this work, a new family of recursive block LU preconditioners is designed and tested for solving the thermally coupled inductionless MHD equations. These preconditioners are obtained after splitting the fully coupled matrix into one-physics problems for every variable (velocity, pressure, current density, electric potential and temperature) that can be optimally solved, e.g., using preconditioned domain decomposition algorithms. The main idea is to arrange the original matrix into an (arbitrary) 2 × 2 block matrix, and consider an LU preconditioner obtained by approximating the corresponding Schur complement. For every one of the diagonal blocks in the LU preconditioner, if it involves more than one type of unknowns, we proceed the same way in a recursive fashion. This approach is stated in an abstract way, and can be straightforwardly applied to other multiphysics problems. Further, we precisely explain a flexible and general software design for the code implementation of this type of preconditioners.
Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods
NASA Astrophysics Data System (ADS)
Pazner, Will; Persson, Per-Olof
2018-02-01
In this paper, we develop a new tensor-product based preconditioner for discontinuous Galerkin methods with polynomial degrees higher than those typically employed. This preconditioner uses an automatic, purely algebraic method to approximate the exact block Jacobi preconditioner by Kronecker products of several small, one-dimensional matrices. Traditional matrix-based preconditioners require O (p2d) storage and O (p3d) computational work, where p is the degree of basis polynomials used, and d is the spatial dimension. Our SVD-based tensor-product preconditioner requires O (p d + 1) storage, O (p d + 1) work in two spatial dimensions, and O (p d + 2) work in three spatial dimensions. Combined with a matrix-free Newton-Krylov solver, these preconditioners allow for the solution of DG systems in linear time in p per degree of freedom in 2D, and reduce the computational complexity from O (p9) to O (p5) in 3D. Numerical results are shown in 2D and 3D for the advection, Euler, and Navier-Stokes equations, using polynomials of degree up to p = 30. For many test cases, the preconditioner results in similar iteration counts when compared with the exact block Jacobi preconditioner, and performance is significantly improved for high polynomial degrees p.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.
The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less
Cortes, Adriano M.; Dalcin, Lisandro; Sarmiento, Adel F.; ...
2016-10-19
The recently introduced divergence-conforming B-spline discretizations allow the construction of smooth discrete velocity–pressure pairs for viscous incompressible flows that are at the same time inf–sup stable and pointwise divergence-free. When applied to the discretized Stokes problem, these spaces generate a symmetric and indefinite saddle-point linear system. The iterative method of choice to solve such system is the Generalized Minimum Residual Method. This method lacks robustness, and one remedy is to use preconditioners. For linear systems of saddle-point type, a large family of preconditioners can be obtained by using a block factorization of the system. In this paper, we show howmore » the nesting of “black-box” solvers and preconditioners can be put together in a block triangular strategy to build a scalable block preconditioner for the Stokes system discretized by divergence-conforming B-splines. Lastly, besides the known cavity flow problem, we used for benchmark flows defined on complex geometries: an eccentric annulus and hollow torus of an eccentric annular cross-section.« less
Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction.
Fessler, J A; Booth, S D
1999-01-01
Gradient-based iterative methods often converge slowly for tomographic image reconstruction and image restoration problems, but can be accelerated by suitable preconditioners. Diagonal preconditioners offer some improvement in convergence rate, but do not incorporate the structure of the Hessian matrices in imaging problems. Circulant preconditioners can provide remarkable acceleration for inverse problems that are approximately shift-invariant, i.e., for those with approximately block-Toeplitz or block-circulant Hessians. However, in applications with nonuniform noise variance, such as arises from Poisson statistics in emission tomography and in quantum-limited optical imaging, the Hessian of the weighted least-squares objective function is quite shift-variant, and circulant preconditioners perform poorly. Additional shift-variance is caused by edge-preserving regularization methods based on nonquadratic penalty functions. This paper describes new preconditioners that approximate more accurately the Hessian matrices of shift-variant imaging problems. Compared to diagonal or circulant preconditioning, the new preconditioners lead to significantly faster convergence rates for the unconstrained conjugate-gradient (CG) iteration. We also propose a new efficient method for the line-search step required by CG methods. Applications to positron emission tomography (PET) illustrate the method.
Fully implicit adaptive mesh refinement solver for 2D MHD
NASA Astrophysics Data System (ADS)
Philip, B.; Chacon, L.; Pernice, M.
2008-11-01
Application of implicit adaptive mesh refinement (AMR) to simulate resistive magnetohydrodynamics is described. Solving this challenging multi-scale, multi-physics problem can improve understanding of reconnection in magnetically-confined plasmas. AMR is employed to resolve extremely thin current sheets, essential for an accurate macroscopic description. Implicit time stepping allows us to accurately follow the dynamical time scale of the developing magnetic field, without being restricted by fast Alfven time scales. At each time step, the large-scale system of nonlinear equations is solved by a Jacobian-free Newton-Krylov method together with a physics-based preconditioner. Each block within the preconditioner is solved optimally using the Fast Adaptive Composite grid method, which can be considered as a multiplicative Schwarz method on AMR grids. We will demonstrate the excellent accuracy and efficiency properties of the method with several challenging reduced MHD applications, including tearing, island coalescence, and tilt instabilities. B. Philip, L. Chac'on, M. Pernice, J. Comput. Phys., in press (2008)
Incomplete Sparse Approximate Inverses for Parallel Preconditioning
Anzt, Hartwig; Huckle, Thomas K.; Bräckle, Jürgen; ...
2017-10-28
In this study, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. Finally, in a study covering a large number of matrices, we identify the ISAI preconditioner as anmore » attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.« less
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics
NASA Astrophysics Data System (ADS)
Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio
2016-12-01
Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics
NASA Astrophysics Data System (ADS)
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-09-01
Multiconjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wave-front control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10-2 Hz, i.e., 4-5 orders of magnitude lower than the typical 103 Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Scalable algorithms for three-field mixed finite element coupled poromechanics
NASA Astrophysics Data System (ADS)
Castelletto, Nicola; White, Joshua A.; Ferronato, Massimiliano
2016-12-01
We introduce a class of block preconditioners for accelerating the iterative solution of coupled poromechanics equations based on a three-field formulation. The use of a displacement/velocity/pressure mixed finite-element method combined with a first order backward difference formula for the approximation of time derivatives produces a sequence of linear systems with a 3 × 3 unsymmetric and indefinite block matrix. The preconditioners are obtained by approximating the two-level Schur complement with the aid of physically-based arguments that can be also generalized in a purely algebraic approach. A theoretical and experimental analysis is presented that provides evidence of the robustness, efficiency and scalability of the proposed algorithm. The performance is also assessed for a real-world challenging consolidation experiment of a shallow formation.
Algorithmically scalable block preconditioner for fully implicit shallow-water equations in CAM-SE
Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.
2014-10-19
Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics.
Gilles, Luc; Ellerbroek, Brent L; Vogel, Curtis R
2003-09-10
Multiconjugate adaptive optics (MCAO) systems with 10(4)-10(5) degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 10(4) actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10(-2) Hz, i.e., 4-5 orders of magnitude lower than the typical 10(3) Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
NASA Astrophysics Data System (ADS)
Spiegelman, M.; Wilson, C. R.
2011-12-01
A quantitative theory of magma production and transport is essential for understanding the dynamics of magmatic plate boundaries, intra-plate volcanism and the geochemical evolution of the planet. It also provides one of the most challenging computational problems in solid Earth science, as it requires consistent coupling of fluid and solid mechanics together with the thermodynamics of melting and reactive flows. Considerable work on these problems over the past two decades shows that small changes in assumptions of coupling (e.g. the relationship between melt fraction and solid rheology), can have profound changes on the behavior of these systems which in turn affects critical computational choices such as discretizations, solvers and preconditioners. To make progress in exploring and understanding this physically rich system requires a computational framework that allows more flexible, high-level description of multi-physics problems as well as increased flexibility in composing efficient algorithms for solution of the full non-linear coupled system. Fortunately, recent advances in available computational libraries and algorithms provide a platform for implementing such a framework. We present results from a new model building system that leverages functionality from both the FEniCS project (www.fenicsproject.org) and PETSc libraries (www.mcs.anl.gov/petsc) along with a model independent options system and gui, Spud (amcg.ese.ic.ac.uk/Spud). Key features from FEniCS include fully unstructured FEM with a wide range of elements; a high-level language (ufl) and code generation compiler (FFC) for describing the weak forms of residuals and automatic differentiation for calculation of exact and approximate jacobians. The overall strategy is to monitor/calculate residuals and jacobians for the entire non-linear system of equations within a global non-linear solve based on PETSc's SNES routines. PETSc already provides a wide range of solvers and preconditioners, from parallel sparse direct to algebraic multigrid, that can be chosen at runtime. In particular, we make extensive use of PETSc's FieldSplit block preconditioners that allow us to use optimal solvers for subproblems (such as Stokes, or advection/diffusion of temperature) as preconditioners for the full problem. Thus these routines let us reuse effective solving recipes/splittings from previous experience while monitoring the convergence of the global problem. These techniques often yield quadratic (Newton like) convergence for the work of standard Picard schemes. We will illustrate this new framework with examples from the Magma Dynamic Demonstration suite (MADDs) of well understood magma dynamics benchmark problems including stokes flow in ridge geometries, magmatic solitary waves and shear-driven melt bands. While development of this system has been driven by magma dynamics, this framework is much more general and can be used for a wide range of PDE based multi-physics models.
NASA Astrophysics Data System (ADS)
Weston, Brian; Nourgaliev, Robert; Delplanque, Jean-Pierre
2017-11-01
We present a new block-based Schur complement preconditioner for simulating all-speed compressible flow with phase change. The conservation equations are discretized with a reconstructed Discontinuous Galerkin method and integrated in time with fully implicit time discretization schemes. The resulting set of non-linear equations is converged using a robust Newton-Krylov framework. Due to the stiffness of the underlying physics associated with stiff acoustic waves and viscous material strength effects, we solve for the primitive-variables (pressure, velocity, and temperature). To enable convergence of the highly ill-conditioned linearized systems, we develop a physics-based preconditioner, utilizing approximate block factorization techniques to reduce the fully-coupled 3×3 system to a pair of reduced 2×2 systems. We demonstrate that our preconditioned Newton-Krylov framework converges on very stiff multi-physics problems, corresponding to large CFL and Fourier numbers, with excellent algorithmic and parallel scalability. Results are shown for the classic lid-driven cavity flow problem as well as for 3D laser-induced phase change. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cyr, Eric C.; Shadid, John N.; Tuminaro, Raymond S.
This study describes the design of Teko, an object-oriented C++ library for implementing advanced block preconditioners. Mathematical design criteria that elucidate the needs of block preconditioning libraries and techniques are explained and shown to motivate the structure of Teko. For instance, a principal design choice was for Teko to strongly reflect the mathematical statement of the preconditioners to reduce development burden and permit focus on the numerics. Additional mechanisms are explained that provide a pathway to developing an optimized production capable block preconditioning capability with Teko. Finally, Teko is demonstrated on fluid flow and magnetohydrodynamics applications. In addition to highlightingmore » the features of the Teko library, these new results illustrate the effectiveness of recent preconditioning developments applied to advanced discretization approaches.« less
Cyr, Eric C.; Shadid, John N.; Tuminaro, Raymond S.
2016-10-27
This study describes the design of Teko, an object-oriented C++ library for implementing advanced block preconditioners. Mathematical design criteria that elucidate the needs of block preconditioning libraries and techniques are explained and shown to motivate the structure of Teko. For instance, a principal design choice was for Teko to strongly reflect the mathematical statement of the preconditioners to reduce development burden and permit focus on the numerics. Additional mechanisms are explained that provide a pathway to developing an optimized production capable block preconditioning capability with Teko. Finally, Teko is demonstrated on fluid flow and magnetohydrodynamics applications. In addition to highlightingmore » the features of the Teko library, these new results illustrate the effectiveness of recent preconditioning developments applied to advanced discretization approaches.« less
Domain decomposition preconditioners for the spectral collocation method
NASA Technical Reports Server (NTRS)
Quarteroni, Alfio; Sacchilandriani, Giovanni
1988-01-01
Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.
NASA Technical Reports Server (NTRS)
Atkins, H. L.; Shu, Chi-Wang
2001-01-01
The explicit stability constraint of the discontinuous Galerkin method applied to the diffusion operator decreases dramatically as the order of the method is increased. Block Jacobi and block Gauss-Seidel preconditioner operators are examined for their effectiveness at accelerating convergence. A Fourier analysis for methods of order 2 through 6 reveals that both preconditioner operators bound the eigenvalues of the discrete spatial operator. Additionally, in one dimension, the eigenvalues are grouped into two or three regions that are invariant with order of the method. Local relaxation methods are constructed that rapidly damp high frequencies for arbitrarily large time step.
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics
NASA Astrophysics Data System (ADS)
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-02-01
Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.
Mang, Andreas; Biros, George
2017-01-01
We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 × speedup for a two dimensional, real world multi-subject medical image registration problem.
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lott, P. Aaron; Woodward, Carol S.; Evans, Katherine J.
Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied. The dominant cost of an implicit time step is the ancillary linear system solves, so we have developed a preconditioner aimed at improving the efficiency of these linear system solves. Our preconditioner is based on an approximate block factorization of the linearized shallow-water equations and has been implemented within the spectral element dynamical core within themore » Community Atmospheric Model (CAM-SE). Furthermore, in this paper we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM-SE.« less
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.
2018-05-01
Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.
Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Assessment of Preconditioner for a USM3D Hierarchical Adaptive Nonlinear Method (HANIM) (Invited)
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.
2016-01-01
Enhancements to the previously reported mixed-element USM3D Hierarchical Adaptive Nonlinear Iteration Method (HANIM) framework have been made to further improve robustness, efficiency, and accuracy of computational fluid dynamic simulations. The key enhancements include a multi-color line-implicit preconditioner, a discretely consistent symmetry boundary condition, and a line-mapping method for the turbulence source term discretization. The USM3D iterative convergence for the turbulent flows is assessed on four configurations. The configurations include a two-dimensional (2D) bump-in-channel, the 2D NACA 0012 airfoil, a three-dimensional (3D) bump-in-channel, and a 3D hemisphere cylinder. The Reynolds Averaged Navier Stokes (RANS) solutions have been obtained using a Spalart-Allmaras turbulence model and families of uniformly refined nested grids. Two types of HANIM solutions using line- and point-implicit preconditioners have been computed. Additional solutions using the point-implicit preconditioner alone (PA) method that broadly represents the baseline solver technology have also been computed. The line-implicit HANIM shows superior iterative convergence in most cases with progressively increasing benefits on finer grids.
Implicit solvers for unstructured meshes
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Mavriplis, Dimitri J.
1991-01-01
Implicit methods for unstructured mesh computations are developed and tested. The approximate system which arises from the Newton-linearization of the nonlinear evolution operator is solved by using the preconditioned generalized minimum residual technique. These different preconditioners are investigated: the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR). The preconditioners have been optimized to have good vectorization properties. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also investigated. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Sala, Marzio
In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Semi-automatic sparse preconditioners for high-order finite element methods on non-uniform meshes
NASA Astrophysics Data System (ADS)
Austin, Travis M.; Brezina, Marian; Jamroz, Ben; Jhurani, Chetan; Manteuffel, Thomas A.; Ruge, John
2012-05-01
High-order finite elements often have a higher accuracy per degree of freedom than the classical low-order finite elements. However, in the context of implicit time-stepping methods, high-order finite elements present challenges to the construction of efficient simulations due to the high cost of inverting the denser finite element matrix. There are many cases where simulations are limited by the memory required to store the matrix and/or the algorithmic components of the linear solver. We are particularly interested in preconditioned Krylov methods for linear systems generated by discretization of elliptic partial differential equations with high-order finite elements. Using a preconditioner like Algebraic Multigrid can be costly in terms of memory due to the need to store matrix information at the various levels. We present a novel method for defining a preconditioner for systems generated by high-order finite elements that is based on a much sparser system than the original high-order finite element system. We investigate the performance for non-uniform meshes on a cube and a cubed sphere mesh, showing that the sparser preconditioner is more efficient and uses significantly less memory. Finally, we explore new methods to construct the sparse preconditioner and examine their effectiveness for non-uniform meshes. We compare results to a direct use of Algebraic Multigrid as a preconditioner and to a two-level additive Schwarz method.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Implicit solvers for unstructured meshes
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Mavriplis, Dimitri J.
1991-01-01
Implicit methods were developed and tested for unstructured mesh computations. The approximate system which arises from the Newton linearization of the nonlinear evolution operator is solved by using the preconditioned GMRES (Generalized Minimum Residual) technique. Three different preconditioners were studied, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over relaxation (SSOR). The preconditioners were optimized to have good vectorization properties. SSOR and ILU were also studied as iterative schemes. The various methods are compared over a wide range of problems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative methods, is also studied. Results are presented for inviscid and turbulent viscous calculations on single and multielement airfoil configurations using globally and adaptively generated meshes.
Local multiplicative Schwarz algorithms for convection-diffusion equations
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Sarkis, Marcus
1995-01-01
We develop a new class of overlapping Schwarz type algorithms for solving scalar convection-diffusion equations discretized by finite element or finite difference methods. The preconditioners consist of two components, namely, the usual two-level additive Schwarz preconditioner and the sum of some quadratic terms constructed by using products of ordered neighboring subdomain preconditioners. The ordering of the subdomain preconditioners is determined by considering the direction of the flow. We prove that the algorithms are optimal in the sense that the convergence rates are independent of the mesh size, as well as the number of subdomains. We show by numerical examples that the new algorithms are less sensitive to the direction of the flow than either the classical multiplicative Schwarz algorithms, and converge faster than the additive Schwarz algorithms. Thus, the new algorithms are more suitable for fluid flow applications than the classical additive or multiplicative Schwarz algorithms.
NASA Technical Reports Server (NTRS)
Maliassov, Serguei
1996-01-01
In this paper an algebraic substructuring preconditioner is considered for nonconforming finite element approximations of second order elliptic problems in 3D domains with a piecewise constant diffusion coefficient. Using a substructuring idea and a block Gauss elimination, part of the unknowns is eliminated and the Schur complement obtained is preconditioned by a spectrally equivalent very sparse matrix. In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver can be used to solve the problem with this matrix. Explicit estimates of condition numbers and implementation algorithms are established for the constructed preconditioner. It is shown that the condition number of the preconditioned matrix does not depend on either the mesh step size or the jump of the coefficient. Finally, numerical experiments are presented to illustrate the theory being developed.
Characterizing the inverses of block tridiagonal, block Toeplitz matrices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boffi, Nicholas M.; Hill, Judith C.; Reuter, Matthew G.
2014-12-04
We consider the inversion of block tridiagonal, block Toeplitz matrices and comment on the behaviour of these inverses as one moves away from the diagonal. Using matrix M bius transformations, we first present an O(1) representation (with respect to the number of block rows and block columns) for the inverse matrix and subsequently use this representation to characterize the inverse matrix. There are four symmetry-distinct cases where the blocks of the inverse matrix (i) decay to zero on both sides of the diagonal, (ii) oscillate on both sides, (iii) decay on one side and oscillate on the other and (iv)more » decay on one side and grow on the other. This characterization exposes the necessary conditions for the inverse matrix to be numerically banded and may also aid in the design of preconditioners and fast algorithms. Finally, we present numerical examples of these matrix types.« less
NASA Technical Reports Server (NTRS)
Cain, Michael D.
1999-01-01
The goal of this thesis is to develop an efficient and robust locally preconditioned semi-coarsening multigrid algorithm for the two-dimensional Navier-Stokes equations. This thesis examines the performance of the multigrid algorithm with local preconditioning for an upwind-discretization of the Navier-Stokes equations. A block Jacobi iterative scheme is used because of its high frequency error mode damping ability. At low Mach numbers, the performance of a flux preconditioner is investigated. The flux preconditioner utilizes a new limiting technique based on local information that was developed by Siu. Full-coarsening and-semi-coarsening are examined as well as the multigrid V-cycle and full multigrid. The numerical tests were performed on a NACA 0012 airfoil at a range of Mach numbers. The tests show that semi-coarsening with flux preconditioning is the most efficient and robust combination of coarsening strategy, and iterative scheme - especially at low Mach numbers.
Low-memory iterative density fitting.
Grajciar, Lukáš
2015-07-30
A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.
Multilevel filtering elliptic preconditioners
NASA Technical Reports Server (NTRS)
Kuo, C. C. Jay; Chan, Tony F.; Tong, Charles
1989-01-01
A class of preconditioners is presented for elliptic problems built on ideas borrowed from the digital filtering theory and implemented on a multilevel grid structure. They are designed to be both rapidly convergent and highly parallelizable. The digital filtering viewpoint allows the use of filter design techniques for constructing elliptic preconditioners and also provides an alternative framework for understanding several other recently proposed multilevel preconditioners. Numerical results are presented to assess the convergence behavior of the new methods and to compare them with other preconditioners of multilevel type, including the usual multigrid method as preconditioner, the hierarchical basis method and a recent method proposed by Bramble-Pasciak-Xu.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures
2008-10-01
42 3.4 Residual history of WSO banded preconditioner for problem 2D 54019 HIGHK . . . . . . . . . . . . . . . . . . . . . . . . . . 43...3.5 Residual history of WSO banded preconditioner for problem Appu 43 3.6 Residual history of WSO banded preconditioner for problem ASIC 680k...44 3.7 Residual history of WSO banded preconditioner for problem BUN- DLE1
The multigrid preconditioned conjugate gradient method
NASA Technical Reports Server (NTRS)
Tatebe, Osamu
1993-01-01
A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models
Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.
2016-01-01
Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
A frequency dependent preconditioned wavelet method for atmospheric tomography
NASA Astrophysics Data System (ADS)
Yudytskiy, Mykhaylo; Helin, Tapio; Ramlau, Ronny
2013-12-01
Atmospheric tomography, i.e. the reconstruction of the turbulence in the atmosphere, is a main task for the adaptive optics systems of the next generation telescopes. For extremely large telescopes, such as the European Extremely Large Telescope, this problem becomes overly complex and an efficient algorithm is needed to reduce numerical costs. Recently, a conjugate gradient method based on wavelet parametrization of turbulence layers was introduced [5]. An iterative algorithm can only be numerically efficient when the number of iterations required for a sufficient reconstruction is low. A way to achieve this is to design an efficient preconditioner. In this paper we propose a new frequency-dependent preconditioner for the wavelet method. In the context of a multi conjugate adaptive optics (MCAO) system simulated on the official end-to-end simulation tool OCTOPUS of the European Southern Observatory we demonstrate robustness and speed of the preconditioned algorithm. We show that three iterations are sufficient for a good reconstruction.
Optimization of Regional Geodynamic Models for Mantle Dynamics
NASA Astrophysics Data System (ADS)
Knepley, M.; Isaac, T.; Jadamec, M. A.
2016-12-01
The SubductionGenerator program is used to construct high resolution, 3D regional thermal structures for mantle convection simulations using a variety of data sources, including sea floor ages and geographically referenced 3D slab locations based on seismic observations. The initial bulk temperature field is constructed using a half-space cooling model or plate cooling model, and related smoothing functions based on a diffusion length-scale analysis. In this work, we seek to improve the 3D thermal model and test different model geometries and dynamically driven flow fields using constraints from observed seismic velocities and plate motions. Through a formal adjoint analysis, we construct the primal-dual version of the multi-objective PDE-constrained optimization problem for the plate motions and seismic misfit. We have efficient, scalable preconditioners for both the forward and adjoint problems based upon a block preconditioning strategy, and a simple gradient update is used to improve the control residual. The full optimal control problem is formulated on a nested hierarchy of grids, allowing a nonlinear multigrid method to accelerate the solution.
Element-topology-independent preconditioners for parallel finite element computations
NASA Technical Reports Server (NTRS)
Park, K. C.; Alexander, Scott
1992-01-01
A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
Tezaur, Irina K.; Tuminaro, Raymond S.; Perego, Mauro; ...
2015-01-01
We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditionermore » results in faster linear solve times but the ILU preconditioner exhibits better scalability. In addition, a weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. We show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.« less
A Partitioning Algorithm for Block-Diagonal Matrices With Overlap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina
2008-02-02
We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less
NASA Astrophysics Data System (ADS)
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.
Generalized Preconditioned Locally Harmonic Residual Eigensolver (GPLHR) v0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
VECHARYNSKI, EUGENE; YANG, CHAO
The software contains a MATLAB implementation of the Generalized Preconditioned Locally Harmonic Residual (GPLHR) method for solving standard and generalized non-Hermitian eigenproblems. The method is particularly useful for computing a subset of eigenvalues, and their eigen- or Schur vectors, closest to a given shift. The proposed method is based on block iterations and can take advantage of a preconditioner if it is available. It does not need to perform exact shift-and-invert transformation. Standard and generalized eigenproblems are handled in a unified framework.
On optimal improvements of classical iterative schemes for Z-matrices
NASA Astrophysics Data System (ADS)
Noutsos, D.; Tzoumas, M.
2006-04-01
Many researchers have considered preconditioners, applied to linear systems, whose matrix coefficient is a Z- or an M-matrix, that make the associated Jacobi and Gauss-Seidel methods converge asymptotically faster than the unpreconditioned ones. Such preconditioners are chosen so that they eliminate the off-diagonal elements of the same column or the elements of the first upper diagonal [Milaszewicz, LAA 93 (1987) 161-170], Gunawardena et al. [LAA 154-156 (1991) 123-143]. In this work we generalize the previous preconditioners to obtain optimal methods. "Good" Jacobi and Gauss-Seidel algorithms are given and preconditioners, that eliminate more than one entry per row, are also proposed and analyzed. Moreover, the behavior of the above preconditioners to the Krylov subspace methods is studied.
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques
NASA Astrophysics Data System (ADS)
Gullerud, Arne Stewart
This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
TerraFERMA: Harnessing Advanced Computational Libraries in Earth Science
NASA Astrophysics Data System (ADS)
Wilson, C. R.; Spiegelman, M.; van Keken, P.
2012-12-01
Many important problems in Earth sciences can be described by non-linear coupled systems of partial differential equations. These "multi-physics" problems include thermo-chemical convection in Earth and planetary interiors, interactions of fluids and magmas with the Earth's mantle and crust and coupled flow of water and ice. These problems are of interest to a large community of researchers but are complicated to model and understand. Much of this complexity stems from the nature of multi-physics where small changes in the coupling between variables or constitutive relations can lead to radical changes in behavior, which in turn affect critical computational choices such as discretizations, solvers and preconditioners. To make progress in understanding such coupled systems requires a computational framework where multi-physics problems can be described at a high-level while maintaining the flexibility to easily modify the solution algorithm. Fortunately, recent advances in computational science provide a basis for implementing such a framework. Here we present the Transparent Finite Element Rapid Model Assembler (TerraFERMA), which leverages several advanced open-source libraries for core functionality. FEniCS (fenicsproject.org) provides a high level language for describing the weak forms of coupled systems of equations, and an automatic code generator that produces finite element assembly code. PETSc (www.mcs.anl.gov/petsc) provides a wide range of scalable linear and non-linear solvers that can be composed into effective multi-physics preconditioners. SPuD (amcg.ese.ic.ac.uk/Spud) is an application neutral options system that provides both human and machine-readable interfaces based on a single xml schema. Our software integrates these libraries and provides the user with a framework for exploring multi-physics problems. A single options file fully describes the problem, including all equations, coefficients and solver options. Custom compiled applications are generated from this file but share an infrastructure for services common to all models, e.g. diagnostics, checkpointing and global non-linear convergence monitoring. This maximizes code reusability, reliability and longevity ensuring that scientific results and the methods used to acquire them are transparent and reproducible. TerraFERMA has been tested against many published geodynamic benchmarks including 2D/3D thermal convection problems, the subduction zone benchmarks and benchmarks for magmatic solitary waves. It is currently being used in the investigation of reactive cracking phenomena with applications to carbon sequestration, but we will principally discuss its use in modeling the migration of fluids in subduction zones. Subduction zones require an understanding of the highly nonlinear interactions of fluids with solids and thus provide an excellent scientific driver for the development of multi-physics software.
Dynamic implicit 3D adaptive mesh refinement for non-equilibrium radiation diffusion
NASA Astrophysics Data System (ADS)
Philip, B.; Wang, Z.; Berrill, M. A.; Birke, M.; Pernice, M.
2014-04-01
The time dependent non-equilibrium radiation diffusion equations are important for solving the transport of energy through radiation in optically thick regimes and find applications in several fields including astrophysics and inertial confinement fusion. The associated initial boundary value problems that are encountered often exhibit a wide range of scales in space and time and are extremely challenging to solve. To efficiently and accurately simulate these systems we describe our research on combining techniques that will also find use more broadly for long term time integration of nonlinear multi-physics systems: implicit time integration for efficient long term time integration of stiff multi-physics systems, local control theory based step size control to minimize the required global number of time steps while controlling accuracy, dynamic 3D adaptive mesh refinement (AMR) to minimize memory and computational costs, Jacobian Free Newton-Krylov methods on AMR grids for efficient nonlinear solution, and optimal multilevel preconditioner components that provide level independent solver convergence.
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines
Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara; ...
2018-02-20
In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Eigenvalue Solvers for Modeling Nuclear Reactors on Leadership Class Machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slaybaugh, R. N.; Ramirez-Zweiger, M.; Pandya, Tara
In this paper, three complementary methods have been implemented in the code Denovo that accelerate neutral particle transport calculations with methods that use leadership-class computers fully and effectively: a multigroup block (MG) Krylov solver, a Rayleigh quotient iteration (RQI) eigenvalue solver, and a multigrid in energy (MGE) preconditioner. The MG Krylov solver converges more quickly than Gauss Seidel and enables energy decomposition such that Denovo can scale to hundreds of thousands of cores. RQI should converge in fewer iterations than power iteration (PI) for large and challenging problems. RQI creates shifted systems that would not be tractable without the MGmore » Krylov solver. It also creates ill-conditioned matrices. The MGE preconditioner reduces iteration count significantly when used with RQI and takes advantage of the new energy decomposition such that it can scale efficiently. Each individual method has been described before, but this is the first time they have been demonstrated to work together effectively. The combination of solvers enables the RQI eigenvalue solver to work better than the other available solvers for large reactors problems on leadership-class machines. Using these methods together, RQI converged in fewer iterations and in less time than PI for a full pressurized water reactor core. These solvers also performed better than an Arnoldi eigenvalue solver for a reactor benchmark problem when energy decomposition is needed. The MG Krylov, MGE preconditioner, and RQI solver combination also scales well in energy. Finally, this solver set is a strong choice for very large and challenging problems.« less
Analysis of physics-based preconditioning for single-phase subchannel equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hansel, J. E.; Ragusa, J. C.; Allu, S.
2013-07-01
The (single-phase) subchannel approximations are used throughout nuclear engineering to provide an efficient flow simulation because the computational burden is much smaller than for computational fluid dynamics (CFD) simulations, and empirical relations have been developed and validated to provide accurate solutions in appropriate flow regimes. Here, the subchannel equations have been recast in a residual form suitable for a multi-physics framework. The Eigen spectrum of the Jacobian matrix, along with several potential physics-based preconditioning approaches, are evaluated, and the the potential for improved convergence from preconditioning is assessed. The physics-based preconditioner options include several forms of reduced equations that decouplemore » the subchannels by neglecting crossflow, conduction, and/or both turbulent momentum and energy exchange between subchannels. Eigen-scopy analysis shows that preconditioning moves clusters of eigenvalues away from zero and toward one. A test problem is run with and without preconditioning. Without preconditioning, the solution failed to converge using GMRES, but application of any of the preconditioners allowed the solution to converge. (authors)« less
Numerical Solution of the Gyrokinetic Poisson Equation in TEMPEST
NASA Astrophysics Data System (ADS)
Dorr, Milo; Cohen, Bruce; Cohen, Ronald; Dimits, Andris; Hittinger, Jeffrey; Kerbel, Gary; Nevins, William; Rognlien, Thomas; Umansky, Maxim; Xiong, Andrew; Xu, Xueqiao
2006-10-01
The gyrokinetic Poisson (GKP) model in the TEMPEST continuum gyrokinetic edge plasma code yields the electrostatic potential due to the charge density of electrons and an arbitrary number of ion species including the effects of gyroaveraging in the limit kρ1. The TEMPEST equations are integrated as a differential algebraic system involving a nonlinear system solve via Newton-Krylov iteration. The GKP preconditioner block is inverted using a multigrid preconditioned conjugate gradient (CG) algorithm. Electrons are treated as kinetic or adiabatic. The Boltzmann relation in the adiabatic option employs flux surface averaging to maintain neutrality within field lines and is solved self-consistently with the GKP equation. A decomposition procedure circumvents the near singularity of the GKP Jacobian block that otherwise degrades CG convergence.
Multi-stage decoding for multi-level block modulation codes
NASA Technical Reports Server (NTRS)
Lin, Shu
1991-01-01
In this paper, we investigate various types of multi-stage decoding for multi-level block modulation codes, in which the decoding of a component code at each stage can be either soft-decision or hard-decision, maximum likelihood or bounded-distance. Error performance of codes is analyzed for a memoryless additive channel based on various types of multi-stage decoding, and upper bounds on the probability of an incorrect decoding are derived. Based on our study and computation results, we find that, if component codes of a multi-level modulation code and types of decoding at various stages are chosen properly, high spectral efficiency and large coding gain can be achieved with reduced decoding complexity. In particular, we find that the difference in performance between the suboptimum multi-stage soft-decision maximum likelihood decoding of a modulation code and the single-stage optimum decoding of the overall code is very small: only a fraction of dB loss in SNR at the probability of an incorrect decoding for a block of 10(exp -6). Multi-stage decoding of multi-level modulation codes really offers a way to achieve the best of three worlds, bandwidth efficiency, coding gain, and decoding complexity.
Construction, classification and parametrization of complex Hadamard matrices
NASA Astrophysics Data System (ADS)
Szöllősi, Ferenc
To improve the design of nuclear systems, high-fidelity neutron fluxes are required. Leadership-class machines provide platforms on which very large problems can be solved. Computing such fluxes efficiently requires numerical methods with good convergence properties and algorithms that can scale to hundreds of thousands of cores. Many 3-D deterministic transport codes are decomposable in space and angle only, limiting them to tens of thousands of cores. Most codes rely on methods such as Gauss Seidel for fixed source problems and power iteration for eigenvalue problems, which can be slow to converge for challenging problems like those with highly scattering materials or high dominance ratios. Three methods have been added to the 3-D SN transport code Denovo that are designed to improve convergence and enable the full use of cutting-edge computers. The first is a multigroup Krylov solver that converges more quickly than Gauss Seidel and parallelizes the code in energy such that Denovo can use hundreds of thousand of cores effectively. The second is Rayleigh quotient iteration (RQI), an old method applied in a new context. This eigenvalue solver finds the dominant eigenvalue in a mathematically optimal way and should converge in fewer iterations than power iteration. RQI creates energy-block-dense equations that the new Krylov solver treats efficiently. However, RQI can have convergence problems because it creates poorly conditioned systems. This can be overcome with preconditioning. The third method is a multigrid-in-energy preconditioner. The preconditioner takes advantage of the new energy decomposition because the grids are in energy rather than space or angle. The preconditioner greatly reduces iteration count for many problem types and scales well in energy. It also allows RQI to be successful for problems it could not solve otherwise. The methods added to Denovo accomplish the goals of this work. They converge in fewer iterations than traditional methods and enable the use of hundreds of thousands of cores. Each method can be used individually, with the multigroup Krylov solver and multigrid-in-energy preconditioner being particularly successful on their own. The largest benefit, though, comes from using these methods in concert.
Two new modified Gauss-Seidel methods for linear system with M-matrices
NASA Astrophysics Data System (ADS)
Zheng, Bing; Miao, Shu-Xin
2009-12-01
In 2002, H. Kotakemori et al. proposed the modified Gauss-Seidel (MGS) method for solving the linear system with the preconditioner [H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner () J. Comput. Appl. Math. 145 (2002) 373-378]. Since this preconditioner is constructed by only the largest element on each row of the upper triangular part of the coefficient matrix, the preconditioning effect is not observed on the nth row. In the present paper, to deal with this drawback, we propose two new preconditioners. The convergence and comparison theorems of the modified Gauss-Seidel methods with these two preconditioners for solving the linear system are established. The convergence rates of the new proposed preconditioned methods are compared. In addition, numerical experiments are used to show the effectiveness of the new MGS methods.
Multi-stage decoding for multi-level block modulation codes
NASA Technical Reports Server (NTRS)
Lin, Shu; Kasami, Tadao
1991-01-01
Various types of multistage decoding for multilevel block modulation codes, in which the decoding of a component code at each stage can be either soft decision or hard decision, maximum likelihood or bounded distance are discussed. Error performance for codes is analyzed for a memoryless additive channel based on various types of multi-stage decoding, and upper bounds on the probability of an incorrect decoding are derived. It was found that, if component codes of a multi-level modulation code and types of decoding at various stages are chosen properly, high spectral efficiency and large coding gain can be achieved with reduced decoding complexity. It was found that the difference in performance between the suboptimum multi-stage soft decision maximum likelihood decoding of a modulation code and the single stage optimum decoding of the overall code is very small, only a fraction of dB loss in SNR at the probability of an incorrect decoding for a block of 10(exp -6). Multi-stage decoding of multi-level modulation codes really offers a way to achieve the best of three worlds, bandwidth efficiency, coding gain, and decoding complexity.
An overview of NSPCG: A nonsymmetric preconditioned conjugate gradient package
NASA Astrophysics Data System (ADS)
Oppe, Thomas C.; Joubert, Wayne D.; Kincaid, David R.
1989-05-01
The most recent research-oriented software package developed as part of the ITPACK Project is called "NSPCG" since it contains many nonsymmetric preconditioned conjugate gradient procedures. It is designed to solve large sparse systems of linear algebraic equations by a variety of different iterative methods. One of the main purposes for the development of the package is to provide a common modular structure for research on iterative methods for nonsymmetric matrices. Another purpose for the development of the package is to investigate the suitability of several iterative methods for vector computers. Since the vectorizability of an iterative method depends greatly on the matrix structure, NSPCG allows great flexibility in the operator representation. The coefficient matrix can be passed in one of several different matrix data storage schemes. These sparse data formats allow matrices with a wide range of structures from highly structured ones such as those with all nonzeros along a relatively small number of diagonals to completely unstructured sparse matrices. Alternatively, the package allows the user to call the accelerators directly with user-supplied routines for performing certain matrix operations. In this case, one can use the data format from an application program and not be required to copy the matrix into one of the package formats. This is particularly advantageous when memory space is limited. Some of the basic preconditioners that are available are point methods such as Jacobi, Incomplete LU Decomposition and Symmetric Successive Overrelaxation as well as block and multicolor preconditioners. The user can select from a large collection of accelerators such as Conjugate Gradient (CG), Chebyshev (SI, for semi-iterative), Generalized Minimal Residual (GMRES), Biconjugate Gradient Squared (BCGS) and many others. The package is modular so that almost any accelerator can be used with almost any preconditioner.
NASA Astrophysics Data System (ADS)
Heinkenschloss, Matthias
2005-01-01
We study a class of time-domain decomposition-based methods for the numerical solution of large-scale linear quadratic optimal control problems. Our methods are based on a multiple shooting reformulation of the linear quadratic optimal control problem as a discrete-time optimal control (DTOC) problem. The optimality conditions for this DTOC problem lead to a linear block tridiagonal system. The diagonal blocks are invertible and are related to the original linear quadratic optimal control problem restricted to smaller time-subintervals. This motivates the application of block Gauss-Seidel (GS)-type methods for the solution of the block tridiagonal systems. Numerical experiments show that the spectral radii of the block GS iteration matrices are larger than one for typical applications, but that the eigenvalues of the iteration matrices decay to zero fast. Hence, while the GS method is not expected to convergence for typical applications, it can be effective as a preconditioner for Krylov-subspace methods. This is confirmed by our numerical tests.A byproduct of this research is the insight that certain instantaneous control techniques can be viewed as the application of one step of the forward block GS method applied to the DTOC optimality system.
On polynomial preconditioning for indefinite Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.
1989-01-01
The minimal residual method is studied combined with polynomial preconditioning for solving large linear systems (Ax = b) with indefinite Hermitian coefficient matrices (A). The standard approach for choosing the polynomial preconditioners leads to preconditioned systems which are positive definite. Here, a different strategy is studied which leaves the preconditioned coefficient matrix indefinite. More precisely, the polynomial preconditioner is designed to cluster the positive, resp. negative eigenvalues of A around 1, resp. around some negative constant. In particular, it is shown that such indefinite polynomial preconditioners can be obtained as the optimal solutions of a certain two parameter family of Chebyshev approximation problems. Some basic results are established for these approximation problems and a Remez type algorithm is sketched for their numerical solution. The problem of selecting the parameters such that the resulting indefinite polynomial preconditioners speeds up the convergence of minimal residual method optimally is also addressed. An approach is proposed based on the concept of asymptotic convergence factors. Finally, some numerical examples of indefinite polynomial preconditioners are given.
NASA Astrophysics Data System (ADS)
Ghafouri, H. R.; Mosharaf-Dehkordi, M.; Afzalan, B.
2017-07-01
A simulation-optimization model is proposed for identifying the characteristics of local immiscible NAPL contaminant sources inside aquifers. This model employs the UTCHEM 9.0 software as its simulator for solving the governing equations associated with the multi-phase flow in porous media. As the optimization model, a novel two-level saturation based Imperialist Competitive Algorithm (ICA) is proposed to estimate the parameters of contaminant sources. The first level consists of three parallel independent ICAs and plays as a pre-conditioner for the second level which is a single modified ICA. The ICA in the second level is modified by dividing each country into a number of provinces (smaller parts). Similar to countries in the classical ICA, these provinces are optimized by the assimilation, competition, and revolution steps in the ICA. To increase the diversity of populations, a new approach named knock the base method is proposed. The performance and accuracy of the simulation-optimization model is assessed by solving a set of two and three-dimensional problems considering the effects of different parameters such as the grid size, rock heterogeneity and designated monitoring networks. The obtained numerical results indicate that using this simulation-optimization model provides accurate results at a less number of iterations when compared with the model employing the classical one-level ICA. A model is proposed to identify characteristics of immiscible NAPL contaminant sources. The contaminant is immiscible in water and multi-phase flow is simulated. The model is a multi-level saturation-based optimization algorithm based on ICA. Each answer string in second level is divided into a set of provinces. Each ICA is modified by incorporating a new knock the base model.
Incomplete augmented Lagrangian preconditioner for steady incompressible Navier-Stokes equations.
Tan, Ning-Bo; Huang, Ting-Zhu; Hu, Ze-Jun
2013-01-01
An incomplete augmented Lagrangian preconditioner, for the steady incompressible Navier-Stokes equations discretized by stable finite elements, is proposed. The eigenvalues of the preconditioned matrix are analyzed. Numerical experiments show that the incomplete augmented Lagrangian-based preconditioner proposed is very robust and performs quite well by the Picard linearization or the Newton linearization over a wide range of values of the viscosity on both uniform and stretched grids.
NASA Technical Reports Server (NTRS)
Bayliss, A.; Goldstein, C. I.; Turkel, E.
1984-01-01
The Helmholtz Equation (-delta-K(2)n(2))u=0 with a variable index of refraction, n, and a suitable radiation condition at infinity serves as a model for a wide variety of wave propagation problems. A numerical algorithm was developed and a computer code implemented that can effectively solve this equation in the intermediate frequency range. The equation is discretized using the finite element method, thus allowing for the modeling of complicated geometrices (including interfaces) and complicated boundary conditions. A global radiation boundary condition is imposed at the far field boundary that is exact for an arbitrary number of propagating modes. The resulting large, non-selfadjoint system of linear equations with indefinite symmetric part is solved using the preconditioned conjugate gradient method applied to the normal equations. A new preconditioner is developed based on the multigrid method. This preconditioner is vectorizable and is extremely effective over a wide range of frequencies provided the number of grid levels is reduced for large frequencies. A heuristic argument is given that indicates the superior convergence properties of this preconditioner.
Till, Andrew T.; Warsa, James S.; Morel, Jim E.
2018-06-15
The thermal radiative transfer (TRT) equations comprise a radiation equation coupled to the material internal energy equation. Linearization of these equations produces effective, thermally-redistributed scattering through absorption-reemission. In this paper, we investigate the effectiveness and efficiency of Linear-Multi-Frequency-Grey (LMFG) acceleration that has been reformulated for use as a preconditioner to Krylov iterative solution methods. We introduce two general frameworks, the scalar flux formulation (SFF) and the absorption rate formulation (ARF), and investigate their iterative properties in the absence and presence of true scattering. SFF has a group-dependent state size but may be formulated without inner iterations in the presence ofmore » scattering, while ARF has a group-independent state size but requires inner iterations when scattering is present. We compare and evaluate the computational cost and efficiency of LMFG applied to these two formulations using a direct solver for the preconditioners. Finally, this work is novel because the use of LMFG for the radiation transport equation, in conjunction with Krylov methods, involves special considerations not required for radiation diffusion.« less
Incomplete Augmented Lagrangian Preconditioner for Steady Incompressible Navier-Stokes Equations
Tan, Ning-Bo; Huang, Ting-Zhu; Hu, Ze-Jun
2013-01-01
An incomplete augmented Lagrangian preconditioner, for the steady incompressible Navier-Stokes equations discretized by stable finite elements, is proposed. The eigenvalues of the preconditioned matrix are analyzed. Numerical experiments show that the incomplete augmented Lagrangian-based preconditioner proposed is very robust and performs quite well by the Picard linearization or the Newton linearization over a wide range of values of the viscosity on both uniform and stretched grids. PMID:24235888
Fast Multilevel Solvers for a Class of Discrete Fourth Order Parabolic Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Bin; Chen, Luoping; Hu, Xiaozhe
2016-03-05
In this paper, we study fast iterative solvers for the solution of fourth order parabolic equations discretized by mixed finite element methods. We propose to use consistent mass matrix in the discretization and use lumped mass matrix to construct efficient preconditioners. We provide eigenvalue analysis for the preconditioned system and estimate the convergence rate of the preconditioned GMRes method. Furthermore, we show that these preconditioners only need to be solved inexactly by optimal multigrid algorithms. Our numerical examples indicate that the proposed preconditioners are very efficient and robust with respect to both discretization parameters and diffusion coefficients. We also investigatemore » the performance of multigrid algorithms with either collective smoothers or distributive smoothers when solving the preconditioner systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weston, Brian T.
This dissertation focuses on the development of a fully-implicit, high-order compressible ow solver with phase change. The work is motivated by laser-induced phase change applications, particularly by the need to develop large-scale multi-physics simulations of the selective laser melting (SLM) process in metal additive manufacturing (3D printing). Simulations of the SLM process require precise tracking of multi-material solid-liquid-gas interfaces, due to laser-induced melting/ solidi cation and evaporation/condensation of metal powder in an ambient gas. These rapid density variations and phase change processes tightly couple the governing equations, requiring a fully compressible framework to robustly capture the rapid density variations ofmore » the ambient gas and the melting/evaporation of the metal powder. For non-isothermal phase change, the velocity is gradually suppressed through the mushy region by a variable viscosity and Darcy source term model. The governing equations are discretized up to 4th-order accuracy with our reconstructed Discontinuous Galerkin spatial discretization scheme and up to 5th-order accuracy with L-stable fully implicit time discretization schemes (BDF2 and ESDIRK3-5). The resulting set of non-linear equations is solved using a robust Newton-Krylov method, with the Jacobian-free version of the GMRES solver for linear iterations. Due to the sti nes associated with the acoustic waves and thermal and viscous/material strength e ects, preconditioning the GMRES solver is essential. A robust and scalable approximate block factorization preconditioner was developed, which utilizes the velocity-pressure (vP) and velocity-temperature (vT) Schur complement systems. This multigrid block reduction preconditioning technique converges for high CFL/Fourier numbers and exhibits excellent parallel and algorithmic scalability on classic benchmark problems in uid dynamics (lid-driven cavity ow and natural convection heat transfer) as well as for laser-induced phase change problems in 2D and 3D.« less
Tang, Cheng-fang; Fang, Ming; Liu, Rui-rui; Dou, Qi; Chai, Zhi-guo; Xiao, Yu-hong; Chen, Ji-hua
2013-12-01
Grape seed extract (GSE) is known to have a positive effect on the demineralization and/or remineralization of artificial root caries lesions. The present study aimed to investigate whether biomodification of caries-like acid-etched demineralized dentine, using proanthocyanidins-rich GSE, would promote its remineralization potential. Dentine specimens were acid-etched for 30s, then biomodified using proanthocyanidin-based preconditioners (at different concentrations and pH values) for 2min, followed by a 15-day artificial remineralization regimen. They were subsequently subjected to microhardness measurements, micromorphological evaluation and X-ray diffraction analyses. Stability of the preconditioners was also analyzed, spectrophotometrically. A concentration-dependent increase was observed in the microhardness of the specimens that were biomodified using GSE preconditioners, without pH adjustment. Field emission scanning electron microscopy revealed greater mineral deposition on their surfaces, which was further identified mainly as hydroxylapatite. The absorbances of preconditioner dilutions at pH 7.4 and pH 10.0 decreased at the two typical polyphenol bands. Transient GSE biomodification promoted remineralization on the surface of demineralized dentine, and this process was influenced by the concentration and pH value of the preconditioner. GSE preconditioner at a concentration of 15%, without pH adjustment, presented with the best results, and this may be attributed to its high polyphenolic content. Copyright © 2013 Elsevier Ltd. All rights reserved.
A universal preconditioner for simulating condensed phase materials.
Packwood, David; Kermode, James; Mones, Letif; Bernstein, Noam; Woolley, John; Gould, Nicholas; Ortner, Christoph; Csányi, Gábor
2016-04-28
We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor of two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.
A universal preconditioner for simulating condensed phase materials
NASA Astrophysics Data System (ADS)
Packwood, David; Kermode, James; Mones, Letif; Bernstein, Noam; Woolley, John; Gould, Nicholas; Ortner, Christoph; Csányi, Gábor
2016-04-01
We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor of two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.
Multilevel Preconditioners for Reaction-Diffusion Problems with Discontinuous Coefficients
Kolev, Tzanio V.; Xu, Jinchao; Zhu, Yunrong
2015-08-23
In this study, we extend some of the multilevel convergence results obtained by Xu and Zhu, to the case of second order linear reaction-diffusion equations. Specifically, we consider the multilevel preconditioners for solving the linear systems arising from the linear finite element approximation of the problem, where both diffusion and reaction coefficients are piecewise-constant functions. We discuss in detail the influence of both the discontinuous reaction and diffusion coefficients to the performance of the classical BPX and multigrid V-cycle preconditioner.
Incomplete Gröbner basis as a preconditioner for polynomial systems
NASA Astrophysics Data System (ADS)
Sun, Yang; Tao, Yu-Hui; Bai, Feng-Shan
2009-04-01
Precondition plays a critical role in the numerical methods for large and sparse linear systems. It is also true for nonlinear algebraic systems. In this paper incomplete Gröbner basis (IGB) is proposed as a preconditioner of homotopy methods for polynomial systems of equations, which transforms a deficient system into a system with the same finite solutions, but smaller degree. The reduced system can thus be solved faster. Numerical results show the efficiency of the preconditioner.
A universal preconditioner for simulating condensed phase materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Packwood, David; Ortner, Christoph, E-mail: c.ortner@warwick.ac.uk; Kermode, James, E-mail: j.r.kermode@warwick.ac.uk
2016-04-28
We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators, and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor ofmore » two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.« less
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model
NASA Astrophysics Data System (ADS)
Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin
2016-08-01
This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
Graph Embedding Techniques for Bounding Condition Numbers of Incomplete Factor Preconditioning
NASA Technical Reports Server (NTRS)
Guattery, Stephen
1997-01-01
We extend graph embedding techniques for bounding the spectral condition number of preconditioned systems involving symmetric, irreducibly diagonally dominant M-matrices to systems where the preconditioner is not diagonally dominant. In particular, this allows us to bound the spectral condition number when the preconditioner is based on an incomplete factorization. We provide a review of previous techniques, describe our extension, and give examples both of a bound for a model problem, and of ways in which our techniques give intuitive way of looking at incomplete factor preconditioners.
Solving Graph Laplacian Systems Through Recursive Bisections and Two-Grid Preconditioning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ponce, Colin; Vassilevski, Panayot S.
2016-02-18
We present a parallelizable direct method for computing the solution to graph Laplacian-based linear systems derived from graphs that can be hierarchically bipartitioned with small edge cuts. For a graph of size n with constant-size edge cuts, our method decomposes a graph Laplacian in time O(n log n), and then uses that decomposition to perform a linear solve in time O(n log n). We then use the developed technique to design a preconditioner for graph Laplacians that do not have this property. Finally, we augment this preconditioner with a two-grid method that accounts for much of the preconditioner's weaknesses. Wemore » present an analysis of this method, as well as a general theorem for the condition number of a general class of two-grid support graph-based preconditioners. Numerical experiments illustrate the performance of the studied methods.« less
A note on the preconditioner Pm=(I+Sm)
NASA Astrophysics Data System (ADS)
Kohno, Toshiyuki; Niki, Hiroshi
2009-03-01
Kotakemori et al. [H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner (I+Smax), Journal of Computational and Applied Mathematics 145 (2002) 373-378] have reported that the convergence rate of the iterative method with a preconditioner Pm=(I+Sm) was superior to one of the modified Gauss-Seidel method under the condition. These authors derived a theorem comparing the Gauss-Seidel method with the proposed method. However, through application of a counter example, Wen Li [Wen Li, A note on the preconditioned GaussSeidel (GS) method for linear systems, Journal of Computational and Applied Mathematics 182 (2005) 81-91] pointed out that there exists a special matrix that does not satisfy this comparison theorem. In this note, we analyze the reason why such a to counter example may be produced, and propose a preconditioner to overcome this problem.
Preconditioned Mixed Spectral Element Methods for Elasticity and Stokes Problems
NASA Technical Reports Server (NTRS)
Pavarino, Luca F.
1996-01-01
Preconditioned iterative methods for the indefinite systems obtained by discretizing the linear elasticity and Stokes problems with mixed spectral elements in three dimensions are introduced and analyzed. The resulting stiffness matrices have the structure of saddle point problems with a penalty term, which is associated with the Poisson ratio for elasticity problems or with stabilization techniques for Stokes problems. The main results of this paper show that the convergence rate of the resulting algorithms is independent of the penalty parameter, the number of spectral elements Nu and mildly dependent on the spectral degree eta via the inf-sup constant. The preconditioners proposed for the whole indefinite system are block-diagonal and block-triangular. Numerical experiments presented in the final section show that these algorithms are a practical and efficient strategy for the iterative solution of the indefinite problems arising from mixed spectral element discretizations of elliptic systems.
Efficiency and flexibility using implicit methods within atmosphere dycores
NASA Astrophysics Data System (ADS)
Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.
2016-12-01
A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
Un-collided-flux preconditioning for the first order transport equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rigley, M.; Koebbe, J.; Drumm, C.
2013-07-01
Two codes were tested for the first order neutron transport equation using finite element methods. The un-collided-flux solution is used as a preconditioner for each of these methods. These codes include a least squares finite element method and a discontinuous finite element method. The performance of each code is shown on problems in one and two dimensions. The un-collided-flux preconditioner shows good speedup on each of the given methods. The un-collided-flux preconditioner has been used on the second-order equation, and here we extend those results to the first order equation. (authors)
An iterative solver for the 3D Helmholtz equation
NASA Astrophysics Data System (ADS)
Belonosov, Mikhail; Dmitriev, Maxim; Kostin, Victor; Neklyudov, Dmitry; Tcheverda, Vladimir
2017-09-01
We develop a frequency-domain iterative solver for numerical simulation of acoustic waves in 3D heterogeneous media. It is based on the application of a unique preconditioner to the Helmholtz equation that ensures convergence for Krylov subspace iteration methods. Effective inversion of the preconditioner involves the Fast Fourier Transform (FFT) and numerical solution of a series of boundary value problems for ordinary differential equations. Matrix-by-vector multiplication for iterative inversion of the preconditioned matrix involves inversion of the preconditioner and pointwise multiplication of grid functions. Our solver has been verified by benchmarking against exact solutions and a time-domain solver.
NASA Astrophysics Data System (ADS)
Debreu, Laurent; Neveu, Emilie; Simon, Ehouarn; Le Dimet, Francois Xavier; Vidard, Arthur
2014-05-01
In order to lower the computational cost of the variational data assimilation process, we investigate the use of multigrid methods to solve the associated optimal control system. On a linear advection equation, we study the impact of the regularization term on the optimal control and the impact of discretization errors on the efficiency of the coarse grid correction step. We show that even if the optimal control problem leads to the solution of an elliptic system, numerical errors introduced by the discretization can alter the success of the multigrid methods. The view of the multigrid iteration as a preconditioner for a Krylov optimization method leads to a more robust algorithm. A scale dependent weighting of the multigrid preconditioner and the usual background error covariance matrix based preconditioner is proposed and brings significant improvements. [1] Laurent Debreu, Emilie Neveu, Ehouarn Simon, François-Xavier Le Dimet and Arthur Vidard, 2014: Multigrid solvers and multigrid preconditioners for the solution of variational data assimilation problems, submitted to QJRMS, http://hal.inria.fr/hal-00874643 [2] Emilie Neveu, Laurent Debreu and François-Xavier Le Dimet, 2011: Multigrid methods and data assimilation - Convergence study and first experiments on non-linear equations, ARIMA, 14, 63-80, http://intranet.inria.fr/international/arima/014/014005.html
NASA Astrophysics Data System (ADS)
Ciarlet, P.
1994-09-01
Hereafter, we describe and analyze, from both a theoretical and a numerical point of view, an iterative method for efficiently solving symmetric elliptic problems with possibly discontinuous coefficients. In the following, we use the Preconditioned Conjugate Gradient method to solve the symmetric positive definite linear systems which arise from the finite element discretization of the problems. We focus our interest on sparse and efficient preconditioners. In order to define the preconditioners, we perform two steps: first we reorder the unknowns and then we carry out a (modified) incomplete factorization of the original matrix. We study numerically and theoretically two preconditioners, the second preconditioner corresponding to the one investigated by Brand and Heinemann [2]. We prove convergence results about the Poisson equation with either Dirichlet or periodic boundary conditions. For a meshsizeh, Brand proved that the condition number of the preconditioned system is bounded byO(h-1/2) for Dirichlet boundary conditions. By slightly modifying the preconditioning process, we prove that the condition number is bounded byO(h-1/3).
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Rapidly converging multigrid reconstruction of cone-beam tomographic data
NASA Astrophysics Data System (ADS)
Myers, Glenn R.; Kingston, Andrew M.; Latham, Shane J.; Recur, Benoit; Li, Thomas; Turner, Michael L.; Beeching, Levi; Sheppard, Adrian P.
2016-10-01
In the context of large-angle cone-beam tomography (CBCT), we present a practical iterative reconstruction (IR) scheme designed for rapid convergence as required for large datasets. The robustness of the reconstruction is provided by the "space-filling" source trajectory along which the experimental data is collected. The speed of convergence is achieved by leveraging the highly isotropic nature of this trajectory to design an approximate deconvolution filter that serves as a pre-conditioner in a multi-grid scheme. We demonstrate this IR scheme for CBCT and compare convergence to that of more traditional techniques.
Efficient solvers for coupled models in respiratory mechanics.
Verdugo, Francesc; Roth, Christian J; Yoshihara, Lena; Wall, Wolfgang A
2017-02-01
We present efficient preconditioners for one of the most physiologically relevant pulmonary models currently available. Our underlying motivation is to enable the efficient simulation of such a lung model on high-performance computing platforms in order to assess mechanical ventilation strategies and contributing to design more protective patient-specific ventilation treatments. The system of linear equations to be solved using the proposed preconditioners is essentially the monolithic system arising in fluid-structure interaction (FSI) extended by additional algebraic constraints. The introduction of these constraints leads to a saddle point problem that cannot be solved with usual FSI preconditioners available in the literature. The key ingredient in this work is to use the idea of the semi-implicit method for pressure-linked equations (SIMPLE) for getting rid of the saddle point structure, resulting in a standard FSI problem that can be treated with available techniques. The numerical examples show that the resulting preconditioners approach the optimal performance of multigrid methods, even though the lung model is a complex multiphysics problem. Moreover, the preconditioners are robust enough to deal with physiologically relevant simulations involving complex real-world patient-specific lung geometries. The same approach is applicable to other challenging biomedical applications where coupling between flow and tissue deformations is modeled with additional algebraic constraints. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2016-08-18
In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less
Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.
1990-01-01
The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Algorithms for parallel and vector computations
NASA Technical Reports Server (NTRS)
Ortega, James M.
1995-01-01
This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.
Novel numerical techniques for magma dynamics
NASA Astrophysics Data System (ADS)
Rhebergen, S.; Katz, R. F.; Wathen, A.; Alisic, L.; Rudge, J. F.; Wells, G.
2013-12-01
We discuss the development of finite element techniques and solvers for magma dynamics computations. These are implemented within the FEniCS framework. This approach allows for user-friendly, expressive, high-level code development, but also provides access to powerful, scalable numerical solvers and a large family of finite element discretisations. With the recent addition of dolfin-adjoint, FeniCS supports automated adjoint and tangent-linear models, enabling the rapid development of Generalised Stability Analysis. The ability to easily scale codes to three dimensions with large meshes, and/or to apply intricate adjoint calculations means that efficiency of the numerical algorithms is vital. We therefore describe our development and analysis of preconditioners designed specifically for finite element discretizations of equations governing magma dynamics. The preconditioners are based on Elman-Silvester-Wathen methods for the Stokes equation, and we extend these to flows with compaction. Our simulations are validated by comparison of results with laboratory experiments on partially molten aggregates.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
On decoding of multi-level MPSK modulation codes
NASA Technical Reports Server (NTRS)
Lin, Shu; Gupta, Alok Kumar
1990-01-01
The decoding problem of multi-level block modulation codes is investigated. The hardware design of soft-decision Viterbi decoder for some short length 8-PSK block modulation codes is presented. An effective way to reduce the hardware complexity of the decoder by reducing the branch metric and path metric, using a non-uniform floating-point to integer mapping scheme, is proposed and discussed. The simulation results of the design are presented. The multi-stage decoding (MSD) of multi-level modulation codes is also investigated. The cases of soft-decision and hard-decision MSD are considered and their performance are evaluated for several codes of different lengths and different minimum squared Euclidean distances. It is shown that the soft-decision MSD reduces the decoding complexity drastically and it is suboptimum. The hard-decision MSD further simplifies the decoding while still maintaining a reasonable coding gain over the uncoded system, if the component codes are chosen properly. Finally, some basic 3-level 8-PSK modulation codes using BCH codes as component codes are constructed and their coding gains are found for hard decision multistage decoding.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalashnikova, Irina
2012-05-01
A numerical study aimed to evaluate different preconditioners within the Trilinos Ifpack and ML packages for the Quantum Computer Aided Design (QCAD) non-linear Poisson problem implemented within the Albany code base and posed on the Ottawa Flat 270 design geometry is performed. This study led to some new development of Albany that allows the user to select an ML preconditioner with Zoltan repartitioning based on nodal coordinates, which is summarized. Convergence of the numerical solutions computed within the QCAD computational suite with successive mesh refinement is examined in two metrics, the mean value of the solution (an L{sup 1} norm)more » and the field integral of the solution (L{sup 2} norm).« less
Tensor-product preconditioners for higher-order space-time discontinuous Galerkin methods
NASA Astrophysics Data System (ADS)
Diosady, Laslo T.; Murman, Scott M.
2017-02-01
A space-time discontinuous-Galerkin spectral-element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equations. An efficient solution technique based on a matrix-free Newton-Krylov method is developed in order to overcome the stiffness associated with high solution order. The use of tensor-product basis functions is key to maintaining efficiency at high-order. Efficient preconditioning methods are presented which can take advantage of the tensor-product formulation. A diagonalized Alternating-Direction-Implicit (ADI) scheme is extended to the space-time discontinuous Galerkin discretization. A new preconditioner for the compressible Euler/Navier-Stokes equations based on the fast-diagonalization method is also presented. Numerical results demonstrate the effectiveness of these preconditioners for the direct numerical simulation of subsonic turbulent flows.
Tensor-Product Preconditioners for Higher-Order Space-Time Discontinuous Galerkin Methods
NASA Technical Reports Server (NTRS)
Diosady, Laslo T.; Murman, Scott M.
2016-01-01
space-time discontinuous-Galerkin spectral-element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equat ions. An efficient solution technique based on a matrix-free Newton-Krylov method is developed in order to overcome the stiffness associated with high solution order. The use of tensor-product basis functions is key to maintaining efficiency at high order. Efficient preconditioning methods are presented which can take advantage of the tensor-product formulation. A diagonalized Alternating-Direction-Implicit (ADI) scheme is extended to the space-time discontinuous Galerkin discretization. A new preconditioner for the compressible Euler/Navier-Stokes equations based on the fast-diagonalization method is also presented. Numerical results demonstrate the effectiveness of these preconditioners for the direct numerical simulation of subsonic turbulent flows.
Ghafouri, H R; Mosharaf-Dehkordi, M; Afzalan, B
2017-07-01
A simulation-optimization model is proposed for identifying the characteristics of local immiscible NAPL contaminant sources inside aquifers. This model employs the UTCHEM 9.0 software as its simulator for solving the governing equations associated with the multi-phase flow in porous media. As the optimization model, a novel two-level saturation based Imperialist Competitive Algorithm (ICA) is proposed to estimate the parameters of contaminant sources. The first level consists of three parallel independent ICAs and plays as a pre-conditioner for the second level which is a single modified ICA. The ICA in the second level is modified by dividing each country into a number of provinces (smaller parts). Similar to countries in the classical ICA, these provinces are optimized by the assimilation, competition, and revolution steps in the ICA. To increase the diversity of populations, a new approach named knock the base method is proposed. The performance and accuracy of the simulation-optimization model is assessed by solving a set of two and three-dimensional problems considering the effects of different parameters such as the grid size, rock heterogeneity and designated monitoring networks. The obtained numerical results indicate that using this simulation-optimization model provides accurate results at a less number of iterations when compared with the model employing the classical one-level ICA. Copyright © 2017 Elsevier B.V. All rights reserved.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
NASA Astrophysics Data System (ADS)
Betté, Srinivas; Diaz, Julio C.; Jines, William R.; Steihaug, Trond
1986-11-01
A preconditioned residual-norm-reducing iterative solver is described. Based on a truncated form of the generalized-conjugate-gradient method for nonsymmetric systems of linear equations, the iterative scheme is very effective for linear systems generated in reservoir simulation of thermal oil recovery processes. As a consequence of employing an adaptive implicit finite-difference scheme to solve the model equations, the number of variables per cell-block varies dynamically over the grid. The data structure allows for 5- and 9-point operators in the areal model, 5-point in the cross-sectional model, and 7- and 11-point operators in the three-dimensional model. Block-diagonal-scaling of the linear system, done prior to iteration, is found to have a significant effect on the rate of convergence. Block-incomplete-LU-decomposition (BILU) and block-symmetric-Gauss-Seidel (BSGS) methods, which result in no fill-in, are used as preconditioning procedures. A full factorization is done on the well terms, and the cells are ordered in a manner which minimizes the fill-in in the well-column due to this factorization. The convergence criterion for the linear (inner) iteration is linked to that of the nonlinear (Newton) iteration, thereby enhancing the efficiency of the computation. The algorithm, with both BILU and BSGS preconditioners, is evaluated in the context of a variety of thermal simulation problems. The solver is robust and can be used with little or no user intervention.
NASA Astrophysics Data System (ADS)
Adrian, S. B.; Andriulli, F. P.; Eibert, T. F.
2017-02-01
A new hierarchical basis preconditioner for the electric field integral equation (EFIE) operator is introduced. In contrast to existing hierarchical basis preconditioners, it works on arbitrary meshes and preconditions both the vector and the scalar potential within the EFIE operator. This is obtained by taking into account that the vector and the scalar potential discretized with loop-star basis functions are related to the hypersingular and the single layer operator (i.e., the well known integral operators from acoustics). For the single layer operator discretized with piecewise constant functions, a hierarchical preconditioner can easily be constructed. Thus the strategy we propose in this work for preconditioning the EFIE is the transformation of the scalar and the vector potential into operators equivalent to the single layer operator and to its inverse. More specifically, when the scalar potential is discretized with star functions as source and testing functions, the resulting matrix is a single layer operator discretized with piecewise constant functions and multiplied left and right with two additional graph Laplacian matrices. By inverting these graph Laplacian matrices, the discretized single layer operator is obtained, which can be preconditioned with the hierarchical basis. Dually, when the vector potential is discretized with loop functions, the resulting matrix can be interpreted as a hypersingular operator discretized with piecewise linear functions. By leveraging on a scalar Calderón identity, we can interpret this operator as spectrally equivalent to the inverse single layer operator. Then we use a linear-in-complexity, closed-form inverse of the dual hierarchical basis to precondition the hypersingular operator. The numerical results show the effectiveness of the proposed preconditioner and the practical impact of theoretical developments in real case scenarios.
Shi, Yiquan; Wolfensteller, Uta; Schubert, Torsten; Ruge, Hannes
2018-02-01
Cognitive flexibility is essential to cope with changing task demands and often it is necessary to adapt to combined changes in a coordinated manner. The present fMRI study examined how the brain implements such multi-level adaptation processes. Specifically, on a "local," hierarchically lower level, switching between two tasks was required across trials while the rules of each task remained unchanged for blocks of trials. On a "global" level regarding blocks of twelve trials, the task rules could reverse or remain the same. The current task was cued at the start of each trial while the current task rules were instructed before the start of a new block. We found that partly overlapping and partly segregated neural networks play different roles when coping with the combination of global rule reversal and local task switching. The fronto-parietal control network (FPN) supported the encoding of reversed rules at the time of explicit rule instruction. The same regions subsequently supported local task switching processes during actual implementation trials, irrespective of rule reversal condition. By contrast, a cortico-striatal network (CSN) including supplementary motor area and putamen was increasingly engaged across implementation trials and more so for rule reversal than for nonreversal blocks, irrespective of task switching condition. Together, these findings suggest that the brain accomplishes the coordinated adaptation to multi-level demand changes by distributing processing resources either across time (FPN for reversed rule encoding and later for task switching) or across regions (CSN for reversed rule implementation and FPN for concurrent task switching). © 2017 Wiley Periodicals, Inc.
Domain decomposition algorithms and computation fluid dynamics
NASA Technical Reports Server (NTRS)
Chan, Tony F.
1988-01-01
In the past several years, domain decomposition was a very popular topic, partly motivated by the potential of parallelization. While a large body of theory and algorithms were developed for model elliptic problems, they are only recently starting to be tested on realistic applications. The application of some of these methods to two model problems in computational fluid dynamics are investigated. Some examples are two dimensional convection-diffusion problems and the incompressible driven cavity flow problem. The construction and analysis of efficient preconditioners for the interface operator to be used in the iterative solution of the interface solution is described. For the convection-diffusion problems, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bruss, D. E.; Morel, J. E.; Ragusa, J. C.
2013-07-01
Preconditioners based upon sweeps and diffusion-synthetic acceleration have been constructed and applied to the zeroth and first spatial moments of the 1-D S{sub n} transport equation using a strictly non negative nonlinear spatial closure. Linear and nonlinear preconditioners have been analyzed. The effectiveness of various combinations of these preconditioners are compared. In one dimension, nonlinear sweep preconditioning is shown to be superior to linear sweep preconditioning, and DSA preconditioning using nonlinear sweeps in conjunction with a linear diffusion equation is found to be essentially equivalent to nonlinear sweeps in conjunction with a nonlinear diffusion equation. The ability to use amore » linear diffusion equation has important implications for preconditioning the S{sub n} equations with a strictly non negative spatial discretization in multiple dimensions. (authors)« less
NASA Technical Reports Server (NTRS)
Lin, Shu; Rhee, Dojun
1996-01-01
This paper is concerned with construction of multilevel concatenated block modulation codes using a multi-level concatenation scheme for the frequency non-selective Rayleigh fading channel. In the construction of multilevel concatenated modulation code, block modulation codes are used as the inner codes. Various types of codes (block or convolutional, binary or nonbinary) are being considered as the outer codes. In particular, we focus on the special case for which Reed-Solomon (RS) codes are used as the outer codes. For this special case, a systematic algebraic technique for constructing q-level concatenated block modulation codes is proposed. Codes have been constructed for certain specific values of q and compared with the single-level concatenated block modulation codes using the same inner codes. A multilevel closest coset decoding scheme for these codes is proposed.
Udzawa-type iterative method with parareal preconditioner for a parabolic optimal control problem
NASA Astrophysics Data System (ADS)
Lapin, A.; Romanenko, A.
2016-11-01
The article deals with the optimal control problem with the parabolic equation as state problem. There are point-wise constraints on the state and control functions. The objective functional involves the observation given in the domain at each moment. The conditions for convergence Udzawa's type iterative method are given. The parareal method to inverse preconditioner is given. The results of calculations are presented.
Tensor-product preconditioners for a space-time discontinuous Galerkin method
NASA Astrophysics Data System (ADS)
Diosady, Laslo T.; Murman, Scott M.
2014-10-01
A space-time discontinuous Galerkin spectral element discretization is presented for direct numerical simulation of the compressible Navier-Stokes equations. An efficient solution technique based on a matrix-free Newton-Krylov method is presented. A diagonalized alternating direction implicit preconditioner is extended to a space-time formulation using entropy variables. The effectiveness of this technique is demonstrated for the direct numerical simulation of turbulent flow in a channel.
Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Avron, Haim; Ng, Esmond G.; Toledo, Sivan
2008-03-21
We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we canmore » drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.« less
Solving coupled groundwater flow systems using a Jacobian Free Newton Krylov method
NASA Astrophysics Data System (ADS)
Mehl, S.
2012-12-01
Jacobian Free Newton Kyrlov (JFNK) methods can have several advantages for simulating coupled groundwater flow processes versus conventional methods. Conventional methods are defined here as those based on an iterative coupling (rather than a direct coupling) and/or that use Picard iteration rather than Newton iteration. In an iterative coupling, the systems are solved separately, coupling information is updated and exchanged between the systems, and the systems are re-solved, etc., until convergence is achieved. Trusted simulators, such as Modflow, are based on these conventional methods of coupling and work well in many cases. An advantage of the JFNK method is that it only requires calculation of the residual vector of the system of equations and thus can make use of existing simulators regardless of how the equations are formulated. This opens the possibility of coupling different process models via augmentation of a residual vector by each separate process, which often requires substantially fewer changes to the existing source code than if the processes were directly coupled. However, appropriate perturbation sizes need to be determined for accurate approximations of the Frechet derivative, which is not always straightforward. Furthermore, preconditioning is necessary for reasonable convergence of the linear solution required at each Kyrlov iteration. Existing preconditioners can be used and applied separately to each process which maximizes use of existing code and robust preconditioners. In this work, iteratively coupled parent-child local grid refinement models of groundwater flow and groundwater flow models with nonlinear exchanges to streams are used to demonstrate the utility of the JFNK approach for Modflow models. Use of incomplete Cholesky preconditioners with various levels of fill are examined on a suite of nonlinear and linear models to analyze the effect of the preconditioner. Comparisons of convergence and computer simulation time are made using conventional iteratively coupled methods and those based on Picard iteration to those formulated with JFNK to gain insights on the types of nonlinearities and system features that make one approach advantageous. Results indicate that nonlinearities associated with stream/aquifer exchanges are more problematic than those resulting from unconfined flow.
Hadagali, Prasannaah; Peters, James R; Balasubramanian, Sriram
2018-03-01
Personalized Finite Element (FE) models and hexahedral elements are preferred for biomechanical investigations. Feature-based multi-block methods are used to develop anatomically accurate personalized FE models with hexahedral mesh. It is tedious to manually construct multi-blocks for large number of geometries on an individual basis to develop personalized FE models. Mesh-morphing method mitigates the aforementioned tediousness in meshing personalized geometries every time, but leads to element warping and loss of geometrical data. Such issues increase in magnitude when normative spine FE model is morphed to scoliosis-affected spinal geometry. The only way to bypass the issue of hex-mesh distortion or loss of geometry as a result of morphing is to rely on manually constructing the multi-blocks for scoliosis-affected spine geometry of each individual, which is time intensive. A method to semi-automate the construction of multi-blocks on the geometry of scoliosis vertebrae from the existing multi-blocks of normative vertebrae is demonstrated in this paper. High-quality hexahedral elements were generated on the scoliosis vertebrae from the morphed multi-blocks of normative vertebrae. Time taken was 3 months to construct the multi-blocks for normative spine and less than a day for scoliosis. Efforts taken to construct multi-blocks on personalized scoliosis spinal geometries are significantly reduced by morphing existing multi-blocks.
NASA Technical Reports Server (NTRS)
Leutenegger, Scott T.; Horton, Graham
1994-01-01
Recently the Multi-Level algorithm was introduced as a general purpose solver for the solution of steady state Markov chains. In this paper, we consider the performance of the Multi-Level algorithm for solving Nearly Completely Decomposable (NCD) Markov chains, for which special-purpose iteractive aggregation/disaggregation algorithms such as the Koury-McAllister-Stewart (KMS) method have been developed that can exploit the decomposability of the the Markov chain. We present experimental results indicating that the general-purpose Multi-Level algorithm is competitive, and can be significantly faster than the special-purpose KMS algorithm when Gauss-Seidel and Gaussian Elimination are used for solving the individual blocks.
NASA Astrophysics Data System (ADS)
Yusa, Yasunori; Okada, Hiroshi; Yamada, Tomonori; Yoshimura, Shinobu
2018-04-01
A domain decomposition method for large-scale elastic-plastic problems is proposed. The proposed method is based on a quasi-Newton method in conjunction with a balancing domain decomposition preconditioner. The use of a quasi-Newton method overcomes two problems associated with the conventional domain decomposition method based on the Newton-Raphson method: (1) avoidance of a double-loop iteration algorithm, which generally has large computational complexity, and (2) consideration of the local concentration of nonlinear deformation, which is observed in elastic-plastic problems with stress concentration. Moreover, the application of a balancing domain decomposition preconditioner ensures scalability. Using the conventional and proposed domain decomposition methods, several numerical tests, including weak scaling tests, were performed. The convergence performance of the proposed method is comparable to that of the conventional method. In particular, in elastic-plastic analysis, the proposed method exhibits better convergence performance than the conventional method.
Newton-Krylov-Schwarz: An implicit solver for CFD
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Keyes, David E.; Venkatakrishnan, V.
1995-01-01
Newton-Krylov methods and Krylov-Schwarz (domain decomposition) methods have begun to become established in computational fluid dynamics (CFD) over the past decade. The former employ a Krylov method inside of Newton's method in a Jacobian-free manner, through directional differencing. The latter employ an overlapping Schwarz domain decomposition to derive a preconditioner for the Krylov accelerator that relies primarily on local information, for data-parallel concurrency. They may be composed as Newton-Krylov-Schwarz (NKS) methods, which seem particularly well suited for solving nonlinear elliptic systems in high-latency, distributed-memory environments. We give a brief description of this family of algorithms, with an emphasis on domain decomposition iterative aspects. We then describe numerical simulations with Newton-Krylov-Schwarz methods on aerodynamics applications emphasizing comparisons with a standard defect-correction approach, subdomain preconditioner consistency, subdomain preconditioner quality, and the effect of a coarse grid.
NASA Astrophysics Data System (ADS)
Whiteley, J. P.
2017-10-01
Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
Preconditioning strategies for nonlinear conjugate gradient methods, based on quasi-Newton updates
NASA Astrophysics Data System (ADS)
Andrea, Caliciotti; Giovanni, Fasano; Massimo, Roma
2016-10-01
This paper reports two proposals of possible preconditioners for the Nonlinear Conjugate Gradient (NCG) method, in large scale unconstrained optimization. On one hand, the common idea of our preconditioners is inspired to L-BFGS quasi-Newton updates, on the other hand we aim at explicitly approximating in some sense the inverse of the Hessian matrix. Since we deal with large scale optimization problems, we propose matrix-free approaches where the preconditioners are built using symmetric low-rank updating formulae. Our distinctive new contributions rely on using information on the objective function collected as by-product of the NCG, at previous iterations. Broadly speaking, our first approach exploits the secant equation, in order to impose interpolation conditions on the objective function. In the second proposal we adopt and ad hoc modified-secant approach, in order to possibly guarantee some additional theoretical properties.
Improving Block-level Efficiency with scsi-mq
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caldwell, Blake A
2015-01-01
Current generation solid-state storage devices are exposing a new bottlenecks in the SCSI and block layers of the Linux kernel, where IO throughput is limited by lock contention, inefficient interrupt handling, and poor memory locality. To address these limitations, the Linux kernel block layer underwent a major rewrite with the blk-mq project to move from a single request queue to a multi-queue model. The Linux SCSI subsystem rework to make use of this new model, known as scsi-mq, has been merged into the Linux kernel and work is underway for dm-multipath support in the upcoming Linux 4.0 kernel. These piecesmore » were necessary to make use of the multi-queue block layer in a Lustre parallel filesystem with high availability requirements. We undertook adding support of the 3.18 kernel to Lustre with scsi-mq and dm-multipath patches to evaluate the potential of these efficiency improvements. In this paper we evaluate the block-level performance of scsi-mq with backing storage hardware representative of a HPC-targerted Lustre filesystem. Our findings show that SCSI write request latency is reduced by as much as 13.6%. Additionally, when profiling the CPU usage of our prototype Lustre filesystem, we found that CPU idle time increased by a factor of 7 with Linux 3.18 and blk-mq as compared to a standard 2.6.32 Linux kernel. Our findings demonstrate increased efficiency of the multi-queue block layer even with disk-based caching storage arrays used in existing parallel filesystems.« less
SymPix: A Spherical Grid for Efficient Sampling of Rotationally Invariant Operators
NASA Astrophysics Data System (ADS)
Seljebotn, D. S.; Eriksen, H. K.
2016-02-01
We present SymPix, a special-purpose spherical grid optimized for efficiently sampling rotationally invariant linear operators. This grid is conceptually similar to the Gauss-Legendre (GL) grid, aligning sample points with iso-latitude rings located on Legendre polynomial zeros. Unlike the GL grid, however, the number of grid points per ring varies as a function of latitude, avoiding expensive oversampling near the poles and ensuring nearly equal sky area per grid point. The ratio between the number of grid points in two neighboring rings is required to be a low-order rational number (3, 2, 1, 4/3, 5/4, or 6/5) to maintain a high degree of symmetries. Our main motivation for this grid is to solve linear systems using multi-grid methods, and to construct efficient preconditioners through pixel-space sampling of the linear operator in question. As a benchmark and representative example, we compute a preconditioner for a linear system that involves the operator \\widehat{{\\boldsymbol{D}}}+{\\widehat{{\\boldsymbol{B}}}}T{{\\boldsymbol{N}}}-1\\widehat{{\\boldsymbol{B}}}, where \\widehat{{\\boldsymbol{B}}} and \\widehat{{\\boldsymbol{D}}} may be described as both local and rotationally invariant operators, and {\\boldsymbol{N}} is diagonal in the pixel domain. For a bandwidth limit of {{\\ell }}{max} = 3000, we find that our new SymPix implementation yields average speed-ups of 360 and 23 for {\\widehat{{\\boldsymbol{B}}}}T{{\\boldsymbol{N}}}-1\\widehat{{\\boldsymbol{B}}} and \\widehat{{\\boldsymbol{D}}}, respectively, compared with the previous state-of-the-art implementation.
Effective matrix-free preconditioning for the augmented immersed interface method
NASA Astrophysics Data System (ADS)
Xia, Jianlin; Li, Zhilin; Ye, Xin
2015-12-01
We present effective and efficient matrix-free preconditioning techniques for the augmented immersed interface method (AIIM). AIIM has been developed recently and is shown to be very effective for interface problems and problems on irregular domains. GMRES is often used to solve for the augmented variable(s) associated with a Schur complement A in AIIM that is defined along the interface or the irregular boundary. The efficiency of AIIM relies on how quickly the system for A can be solved. For some applications, there are substantial difficulties involved, such as the slow convergence of GMRES (particularly for free boundary and moving interface problems), and the inconvenience in finding a preconditioner (due to the situation that only the products of A and vectors are available). Here, we propose matrix-free structured preconditioning techniques for AIIM via adaptive randomized sampling, using only the products of A and vectors to construct a hierarchically semiseparable matrix approximation to A. Several improvements over existing schemes are shown so as to enhance the efficiency and also avoid potential instability. The significance of the preconditioners includes: (1) they do not require the entries of A or the multiplication of AT with vectors; (2) constructing the preconditioners needs only O (log N) matrix-vector products and O (N) storage, where N is the size of A; (3) applying the preconditioners needs only O (N) flops; (4) they are very flexible and do not require any a priori knowledge of the structure of A. The preconditioners are observed to significantly accelerate the convergence of GMRES, with heuristical justifications of the effectiveness. Comprehensive tests on several important applications are provided, such as Navier-Stokes equations on irregular domains with traction boundary conditions, interface problems in incompressible flows, mixed boundary problems, and free boundary problems. The preconditioning techniques are also useful for several other problems and methods.
NASA Astrophysics Data System (ADS)
Mundis, Nathan L.; Mavriplis, Dimitri J.
2017-09-01
The time-spectral method applied to the Euler and coupled aeroelastic equations theoretically offers significant computational savings for purely periodic problems when compared to standard time-implicit methods. However, attaining superior efficiency with time-spectral methods over traditional time-implicit methods hinges on the ability rapidly to solve the large non-linear system resulting from time-spectral discretizations which become larger and stiffer as more time instances are employed or the period of the flow becomes especially short (i.e. the maximum resolvable wave-number increases). In order to increase the efficiency of these solvers, and to improve robustness, particularly for large numbers of time instances, the Generalized Minimal Residual Method (GMRES) is used to solve the implicit linear system over all coupled time instances. The use of GMRES as the linear solver makes time-spectral methods more robust, allows them to be applied to a far greater subset of time-accurate problems, including those with a broad range of harmonic content, and vastly improves the efficiency of time-spectral methods. In previous work, a wave-number independent preconditioner that mitigates the increased stiffness of the time-spectral method when applied to problems with large resolvable wave numbers has been developed. This preconditioner, however, directly inverts a large matrix whose size increases in proportion to the number of time instances. As a result, the computational time of this method scales as the cube of the number of time instances. In the present work, this preconditioner has been reworked to take advantage of an approximate-factorization approach that effectively decouples the spatial and temporal systems. Once decoupled, the time-spectral matrix can be inverted in frequency space, where it has entries only on the main diagonal and therefore can be inverted quite efficiently. This new GMRES/preconditioner combination is shown to be over an order of magnitude more efficient than the previous wave-number independent preconditioner for problems with large numbers of time instances and/or large reduced frequencies.
NASA Astrophysics Data System (ADS)
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-12-01
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-01-01
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each ai contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary. PMID:25518810
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cai, Yunfeng, E-mail: yfcai@math.pku.edu.cn; Department of Computer Science, University of California, Davis 95616; Bai, Zhaojun, E-mail: bai@cs.ucdavis.edu
2013-12-15
The iterative diagonalization of a sequence of large ill-conditioned generalized eigenvalue problems is a computational bottleneck in quantum mechanical methods employing a nonorthogonal basis for ab initio electronic structure calculations. We propose a hybrid preconditioning scheme to effectively combine global and locally accelerated preconditioners for rapid iterative diagonalization of such eigenvalue problems. In partition-of-unity finite-element (PUFE) pseudopotential density-functional calculations, employing a nonorthogonal basis, we show that the hybrid preconditioned block steepest descent method is a cost-effective eigensolver, outperforming current state-of-the-art global preconditioning schemes, and comparably efficient for the ill-conditioned generalized eigenvalue problems produced by PUFE as the locally optimal blockmore » preconditioned conjugate-gradient method for the well-conditioned standard eigenvalue problems produced by planewave methods.« less
Politics of innovation in multi-level water governance systems
NASA Astrophysics Data System (ADS)
Daniell, Katherine A.; Coombes, Peter J.; White, Ian
2014-11-01
Innovations are being proposed in many countries in order to support change towards more sustainable and water secure futures. However, the extent to which they can be implemented is subject to complex politics and powerful coalitions across multi-level governance systems and scales of interest. Exactly how innovation uptake can be best facilitated or blocked in these complex systems is thus a matter of important practical and research interest in water cycle management. From intervention research studies in Australia, China and Bulgaria, this paper seeks to describe and analyse the behind-the-scenes struggles and coalition-building that occurs between water utility providers, private companies, experts, communities and all levels of government in an effort to support or block specific innovations. The research findings suggest that in order to ensure successful passage of the proposed innovations, champions for it are required from at least two administrative levels, including one with innovation implementation capacity, as part of a larger supportive coalition. Higher governance levels can play an important enabling role in facilitating the passage of certain types of innovations that may be in competition with currently entrenched systems of water management. Due to a range of natural biases, experts on certain innovations and disciplines may form part of supporting or blocking coalitions but their evaluations of worth for water system sustainability and security are likely to be subject to competing claims based on different values and expertise, so may not necessarily be of use in resolving questions of "best courses of action". This remains a political values-based decision to be negotiated through the receiving multi-level water governance system.
Least Reliable Bits Coding (LRBC) for high data rate satellite communications
NASA Technical Reports Server (NTRS)
Vanderaar, Mark; Wagner, Paul; Budinger, James
1992-01-01
An analysis and discussion of a bandwidth efficient multi-level/multi-stage block coded modulation technique called Least Reliable Bits Coding (LRBC) is presented. LRBC uses simple multi-level component codes that provide increased error protection on increasingly unreliable modulated bits in order to maintain an overall high code rate that increases spectral efficiency. Further, soft-decision multi-stage decoding is used to make decisions on unprotected bits through corrections made on more protected bits. Using analytical expressions and tight performance bounds it is shown that LRBC can achieve increased spectral efficiency and maintain equivalent or better power efficiency compared to that of Binary Phase Shift Keying (BPSK). Bit error rates (BER) vs. channel bit energy with Additive White Gaussian Noise (AWGN) are given for a set of LRB Reed-Solomon (RS) encoded 8PSK modulation formats with an ensemble rate of 8/9. All formats exhibit a spectral efficiency of 2.67 = (log2(8))(8/9) information bps/Hz. Bit by bit coded and uncoded error probabilities with soft-decision information are determined. These are traded with with code rate to determine parameters that achieve good performance. The relative simplicity of Galois field algebra vs. the Viterbi algorithm and the availability of high speed commercial Very Large Scale Integration (VLSI) for block codes indicates that LRBC using block codes is a desirable method for high data rate implementations.
A multilevel preconditioner for domain decomposition boundary systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.
1991-12-11
In this note, we consider multilevel preconditioning of the reduced boundary systems which arise in non-overlapping domain decomposition methods. It will be shown that the resulting preconditioned systems have condition numbers which be bounded in the case of multilevel spaces on the whole domain and grow at most proportional to the number of levels in the case of multilevel boundary spaces without multilevel extensions into the interior.
Condition number estimation of preconditioned matrices.
Kushida, Noriyuki
2015-01-01
The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager's method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei's matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei's matrix, and matrices generated with the finite element method.
Wei, Chun-Yan; Gao, Fei; Wen, Qiao-Yan; Wang, Tian-Yin
2014-12-18
Until now, the only kind of practical quantum private query (QPQ), quantum-key-distribution (QKD)-based QPQ, focuses on the retrieval of a single bit. In fact, meaningful message is generally composed of multiple adjacent bits (i.e., a multi-bit block). To obtain a message a1a2···al from database, the user Alice has to query l times to get each ai. In this condition, the server Bob could gain Alice's privacy once he obtains the address she queried in any of the l queries, since each a(i) contributes to the message Alice retrieves. Apparently, the longer the retrieved message is, the worse the user privacy becomes. To solve this problem, via an unbalanced-state technique and based on a variant of multi-level BB84 protocol, we present a protocol for QPQ of blocks, which allows the user to retrieve a multi-bit block from database in one query. Our protocol is somewhat like the high-dimension version of the first QKD-based QPQ protocol proposed by Jacobi et al., but some nontrivial modifications are necessary.
Shape reanalysis and sensitivities utilizing preconditioned iterative boundary solvers
NASA Technical Reports Server (NTRS)
Guru Prasad, K.; Kane, J. H.
1992-01-01
The computational advantages associated with the utilization of preconditined iterative equation solvers are quantified for the reanalysis of perturbed shapes using continuum structural boundary element analysis (BEA). Both single- and multi-zone three-dimensional problems are examined. Significant reductions in computer time are obtained by making use of previously computed solution vectors and preconditioners in subsequent analyses. The effectiveness of this technique is demonstrated for the computation of shape response sensitivities required in shape optimization. Computer times and accuracies achieved using the preconditioned iterative solvers are compared with those obtained via direct solvers and implicit differentiation of the boundary integral equations. It is concluded that this approach employing preconditioned iterative equation solvers in reanalysis and sensitivity analysis can be competitive with if not superior to those involving direct solvers.
High-Order Methods for Incompressible Fluid Flow
NASA Astrophysics Data System (ADS)
Deville, M. O.; Fischer, P. F.; Mund, E. H.
2002-08-01
High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.
Condition Number Estimation of Preconditioned Matrices
Kushida, Noriyuki
2015-01-01
The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager’s method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei’s matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei’s matrix, and matrices generated with the finite element method. PMID:25816331
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipnikov, Konstantin; Moulton, David; Svyatskiy, Daniil
2016-04-29
We develop a new approach for solving the nonlinear Richards’ equation arising in variably saturated flow modeling. The growing complexity of geometric models for simulation of subsurface flows leads to the necessity of using unstructured meshes and advanced discretization methods. Typically, a numerical solution is obtained by first discretizing PDEs and then solving the resulting system of nonlinear discrete equations with a Newton-Raphson-type method. Efficiency and robustness of the existing solvers rely on many factors, including an empiric quality control of intermediate iterates, complexity of the employed discretization method and a customized preconditioner. We propose and analyze a new preconditioningmore » strategy that is based on a stable discretization of the continuum Jacobian. We will show with numerical experiments for challenging problems in subsurface hydrology that this new preconditioner improves convergence of the existing Jacobian-free solvers 3-20 times. Furthermore, we show that the Picard method with this preconditioner becomes a more efficient nonlinear solver than a few widely used Jacobian-free solvers.« less
An extended GS method for dense linear systems
NASA Astrophysics Data System (ADS)
Niki, Hiroshi; Kohno, Toshiyuki; Abe, Kuniyoshi
2009-09-01
Davey and Rosindale [K. Davey, I. Rosindale, An iterative solution scheme for systems of boundary element equations, Internat. J. Numer. Methods Engrg. 37 (1994) 1399-1411] derived the GSOR method, which uses an upper triangular matrix [Omega] in order to solve dense linear systems. By applying functional analysis, the authors presented an expression for the optimum [Omega]. Moreover, Davey and Bounds [K. Davey, S. Bounds, A generalized SOR method for dense linear systems of boundary element equations, SIAM J. Comput. 19 (1998) 953-967] also introduced further interesting results. In this note, we employ a matrix analysis approach to investigate these schemes, and derive theorems that compare these schemes with existing preconditioners for dense linear systems. We show that the convergence rate of the Gauss-Seidel method with preconditioner PG is superior to that of the GSOR method. Moreover, we define some splittings associated with the iterative schemes. Some numerical examples are reported to confirm the theoretical analysis. We show that the EGS method with preconditioner produces an extremely small spectral radius in comparison with the other schemes considered.
Bi-level multi-source learning for heterogeneous block-wise missing data.
Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-11-15
Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. © 2013 Elsevier Inc. All rights reserved.
2014-01-01
system (here using left- preconditioning ) (KÃ)x = Kb̃, (3.1) where K is a low-order polynomial in à given by K = s(Ã) = m∑ i=0 kià i, (3.2) and has a... system with a complex spectrum, region E in the complex plane must be some convex form (e.g., an ellipse or polygon) that approximately encloses the...preconditioners with p = 2 and p = 20 on the spectrum of the preconditioned system matrices Kà and KH̃ for both CG Schur-complement form and DG form cases
Higher Order, Hybrid BEM/FEM Methods Applied to Antenna Modeling
NASA Technical Reports Server (NTRS)
Fink, P. W.; Wilton, D. R.; Dobbins, J. A.
2002-01-01
In this presentation, the authors address topics relevant to higher order modeling using hybrid BEM/FEM formulations. The first of these is the limitation on convergence rates imposed by geometric modeling errors in the analysis of scattering by a dielectric sphere. The second topic is the application of an Incomplete LU Threshold (ILUT) preconditioner to solve the linear system resulting from the BEM/FEM formulation. The final tOpic is the application of the higher order BEM/FEM formulation to antenna modeling problems. The authors have previously presented work on the benefits of higher order modeling. To achieve these benefits, special attention is required in the integration of singular and near-singular terms arising in the surface integral equation. Several methods for handling these terms have been presented. It is also well known that achieving he high rates of convergence afforded by higher order bases may als'o require the employment of higher order geometry models. A number of publications have described the use of quadratic elements to model curved surfaces. The authors have shown in an EFIE formulation, applied to scattering by a PEC .sphere, that quadratic order elements may be insufficient to prevent the domination of modeling errors. In fact, on a PEC sphere with radius r = 0.58 Lambda(sub 0), a quartic order geometry representation was required to obtain a convergence benefi.t from quadratic bases when compared to the convergence rate achieved with linear bases. Initial trials indicate that, for a dielectric sphere of the same radius, - requirements on the geometry model are not as severe as for the PEC sphere. The authors will present convergence results for higher order bases as a function of the geometry model order in the hybrid BEM/FEM formulation applied to dielectric spheres. It is well known that the system matrix resulting from the hybrid BEM/FEM formulation is ill -conditioned. For many real applications, a good preconditioner is required to obtain usable convergence from an iterative solver. The authors have examined the use of an Incomplete LU Threshold (ILUT) preconditioner . to solver linear systems stemming from higher order BEM/FEM formulations in 2D scattering problems. Although the resulting preconditioner provided aD excellent approximation to the system inverse, its size in terms of non-zero entries represented only a modest improvement when compared with the fill-in associated with a sparse direct solver. Furthermore, the fill-in of the preconditioner could not be substantially reduced without the occurrence of instabilities. In addition to the results for these 2D problems, the authors will present iterative solution data from the application of the ILUT preconditioner to 3D problems.
Combustion Stability Innovations for Liquid Rocket
2010-01-31
waves within the pipe . Acoustic time for one pass = 0.003 sec. Closed end The following figure shows the second harmonic of the quarter wave mode at...waveguides at the center of the test section. The two drivers at either end can operate at sync or at a specified phase difference. The effect of close ...preserve conservation in real time. The preconditioner operates on the inner loop driving the solution to the next time level. Sufficient number of inner
Philip, Bobby; Berrill, Mark A.; Allu, Srikanth; ...
2015-01-26
We describe an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors are described. The details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstratingmore » the achieved efficiency of the algorithm are presented. Moreover, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.« less
An electrostatic Particle-In-Cell code on multi-block structured meshes
NASA Astrophysics Data System (ADS)
Meierbachtol, Collin S.; Svyatskiy, Daniil; Delzanno, Gian Luca; Vernon, Louis J.; Moulton, J. David
2017-12-01
We present an electrostatic Particle-In-Cell (PIC) code on multi-block, locally structured, curvilinear meshes called Curvilinear PIC (CPIC). Multi-block meshes are essential to capture complex geometries accurately and with good mesh quality, something that would not be possible with single-block structured meshes that are often used in PIC and for which CPIC was initially developed. Despite the structured nature of the individual blocks, multi-block meshes resemble unstructured meshes in a global sense and introduce several new challenges, such as the presence of discontinuities in the mesh properties and coordinate orientation changes across adjacent blocks, and polyjunction points where an arbitrary number of blocks meet. In CPIC, these challenges have been met by an approach that features: (1) a curvilinear formulation of the PIC method: each mesh block is mapped from the physical space, where the mesh is curvilinear and arbitrarily distorted, to the logical space, where the mesh is uniform and Cartesian on the unit cube; (2) a mimetic discretization of Poisson's equation suitable for multi-block meshes; and (3) a hybrid (logical-space position/physical-space velocity), asynchronous particle mover that mitigates the performance degradation created by the necessity to track particles as they move across blocks. The numerical accuracy of CPIC was verified using two standard plasma-material interaction tests, which demonstrate good agreement with the corresponding analytic solutions. Compared to PIC codes on unstructured meshes, which have also been used for their flexibility in handling complex geometries but whose performance suffers from issues associated with data locality and indirect data access patterns, PIC codes on multi-block structured meshes may offer the best compromise for capturing complex geometries while also maintaining solution accuracy and computational efficiency.
An electrostatic Particle-In-Cell code on multi-block structured meshes
Meierbachtol, Collin S.; Svyatskiy, Daniil; Delzanno, Gian Luca; ...
2017-09-14
We present an electrostatic Particle-In-Cell (PIC) code on multi-block, locally structured, curvilinear meshes called Curvilinear PIC (CPIC). Multi-block meshes are essential to capture complex geometries accurately and with good mesh quality, something that would not be possible with single-block structured meshes that are often used in PIC and for which CPIC was initially developed. In spite of the structured nature of the individual blocks, multi-block meshes resemble unstructured meshes in a global sense and introduce several new challenges, such as the presence of discontinuities in the mesh properties and coordinate orientation changes across adjacent blocks, and polyjunction points where anmore » arbitrary number of blocks meet. In CPIC, these challenges have been met by an approach that features: (1) a curvilinear formulation of the PIC method: each mesh block is mapped from the physical space, where the mesh is curvilinear and arbitrarily distorted, to the logical space, where the mesh is uniform and Cartesian on the unit cube; (2) a mimetic discretization of Poisson's equation suitable for multi-block meshes; and (3) a hybrid (logical-space position/physical-space velocity), asynchronous particle mover that mitigates the performance degradation created by the necessity to track particles as they move across blocks. The numerical accuracy of CPIC was verified using two standard plasma–material interaction tests, which demonstrate good agreement with the corresponding analytic solutions. And compared to PIC codes on unstructured meshes, which have also been used for their flexibility in handling complex geometries but whose performance suffers from issues associated with data locality and indirect data access patterns, PIC codes on multi-block structured meshes may offer the best compromise for capturing complex geometries while also maintaining solution accuracy and computational efficiency.« less
An electrostatic Particle-In-Cell code on multi-block structured meshes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meierbachtol, Collin S.; Svyatskiy, Daniil; Delzanno, Gian Luca
We present an electrostatic Particle-In-Cell (PIC) code on multi-block, locally structured, curvilinear meshes called Curvilinear PIC (CPIC). Multi-block meshes are essential to capture complex geometries accurately and with good mesh quality, something that would not be possible with single-block structured meshes that are often used in PIC and for which CPIC was initially developed. In spite of the structured nature of the individual blocks, multi-block meshes resemble unstructured meshes in a global sense and introduce several new challenges, such as the presence of discontinuities in the mesh properties and coordinate orientation changes across adjacent blocks, and polyjunction points where anmore » arbitrary number of blocks meet. In CPIC, these challenges have been met by an approach that features: (1) a curvilinear formulation of the PIC method: each mesh block is mapped from the physical space, where the mesh is curvilinear and arbitrarily distorted, to the logical space, where the mesh is uniform and Cartesian on the unit cube; (2) a mimetic discretization of Poisson's equation suitable for multi-block meshes; and (3) a hybrid (logical-space position/physical-space velocity), asynchronous particle mover that mitigates the performance degradation created by the necessity to track particles as they move across blocks. The numerical accuracy of CPIC was verified using two standard plasma–material interaction tests, which demonstrate good agreement with the corresponding analytic solutions. And compared to PIC codes on unstructured meshes, which have also been used for their flexibility in handling complex geometries but whose performance suffers from issues associated with data locality and indirect data access patterns, PIC codes on multi-block structured meshes may offer the best compromise for capturing complex geometries while also maintaining solution accuracy and computational efficiency.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oh, Duk -Soon; Widlund, Olof B.; Zampini, Stefano
Here, a BDDC domain decomposition preconditioner is defined by a coarse component, expressed in terms of primal constraints, a weighted average across the interface between the subdomains, and local components given in terms of solvers of local subdomain problems. BDDC methods for vector field problems discretized with Raviart-Thomas finite elements are introduced. The methods are based on a new type of weighted average an adaptive selection of primal constraints developed to deal with coefficients with high contrast even inside individual subdomains. For problems with very many subdomains, a third level of the preconditioner is introduced. Assuming that the subdomains aremore » all built from elements of a coarse triangulation of the given domain, and that in each subdomain the material parameters are consistent, one obtains a bound for the preconditioned linear system's condition number which is independent of the values and jumps of these parameters across the subdomains' interface. Numerical experiments, using the PETSc library, are also presented which support the theory and show the algorithms' effectiveness even for problems not covered by the theory. Also included are experiments with Brezzi-Douglas-Marini finite-element approximations.« less
Oh, Duk -Soon; Widlund, Olof B.; Zampini, Stefano; ...
2017-06-21
Here, a BDDC domain decomposition preconditioner is defined by a coarse component, expressed in terms of primal constraints, a weighted average across the interface between the subdomains, and local components given in terms of solvers of local subdomain problems. BDDC methods for vector field problems discretized with Raviart-Thomas finite elements are introduced. The methods are based on a new type of weighted average an adaptive selection of primal constraints developed to deal with coefficients with high contrast even inside individual subdomains. For problems with very many subdomains, a third level of the preconditioner is introduced. Assuming that the subdomains aremore » all built from elements of a coarse triangulation of the given domain, and that in each subdomain the material parameters are consistent, one obtains a bound for the preconditioned linear system's condition number which is independent of the values and jumps of these parameters across the subdomains' interface. Numerical experiments, using the PETSc library, are also presented which support the theory and show the algorithms' effectiveness even for problems not covered by the theory. Also included are experiments with Brezzi-Douglas-Marini finite-element approximations.« less
A novel method for a multi-level hierarchical composite with brick-and-mortar structure
Brandt, Kristina; Wolff, Michael F. H.; Salikov, Vitalij; Heinrich, Stefan; Schneider, Gerold A.
2013-01-01
The fascination for hierarchically structured hard tissues such as enamel or nacre arises from their unique structure-properties-relationship. During the last decades this numerously motivated the synthesis of composites, mimicking the brick-and-mortar structure of nacre. However, there is still a lack in synthetic engineering materials displaying a true hierarchical structure. Here, we present a novel multi-step processing route for anisotropic 2-level hierarchical composites by combining different coating techniques on different length scales. It comprises polymer-encapsulated ceramic particles as building blocks for the first level, followed by spouted bed spray granulation for a second level, and finally directional hot pressing to anisotropically consolidate the composite. The microstructure achieved reveals a brick-and-mortar hierarchical structure with distinct, however not yet optimized mechanical properties on each level. It opens up a completely new processing route for the synthesis of multi-level hierarchically structured composites, giving prospects to multi-functional structure-properties relationships. PMID:23900554
A novel method for a multi-level hierarchical composite with brick-and-mortar structure.
Brandt, Kristina; Wolff, Michael F H; Salikov, Vitalij; Heinrich, Stefan; Schneider, Gerold A
2013-01-01
The fascination for hierarchically structured hard tissues such as enamel or nacre arises from their unique structure-properties-relationship. During the last decades this numerously motivated the synthesis of composites, mimicking the brick-and-mortar structure of nacre. However, there is still a lack in synthetic engineering materials displaying a true hierarchical structure. Here, we present a novel multi-step processing route for anisotropic 2-level hierarchical composites by combining different coating techniques on different length scales. It comprises polymer-encapsulated ceramic particles as building blocks for the first level, followed by spouted bed spray granulation for a second level, and finally directional hot pressing to anisotropically consolidate the composite. The microstructure achieved reveals a brick-and-mortar hierarchical structure with distinct, however not yet optimized mechanical properties on each level. It opens up a completely new processing route for the synthesis of multi-level hierarchically structured composites, giving prospects to multi-functional structure-properties relationships.
A novel method for a multi-level hierarchical composite with brick-and-mortar structure
NASA Astrophysics Data System (ADS)
Brandt, Kristina; Wolff, Michael F. H.; Salikov, Vitalij; Heinrich, Stefan; Schneider, Gerold A.
2013-07-01
The fascination for hierarchically structured hard tissues such as enamel or nacre arises from their unique structure-properties-relationship. During the last decades this numerously motivated the synthesis of composites, mimicking the brick-and-mortar structure of nacre. However, there is still a lack in synthetic engineering materials displaying a true hierarchical structure. Here, we present a novel multi-step processing route for anisotropic 2-level hierarchical composites by combining different coating techniques on different length scales. It comprises polymer-encapsulated ceramic particles as building blocks for the first level, followed by spouted bed spray granulation for a second level, and finally directional hot pressing to anisotropically consolidate the composite. The microstructure achieved reveals a brick-and-mortar hierarchical structure with distinct, however not yet optimized mechanical properties on each level. It opens up a completely new processing route for the synthesis of multi-level hierarchically structured composites, giving prospects to multi-functional structure-properties relationships.
Two variants of minimum discarded fill ordering
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, E.F.; Forsyth, P.A.; Tang, Wei-Pai
1991-01-01
It is well known that the ordering of the unknowns can have a significant effect on the convergence of Preconditioned Conjugate Gradient (PCG) methods. There has been considerable experimental work on the effects of ordering for regular finite difference problems. In many cases, good results have been obtained with preconditioners based on diagonal, spiral or natural row orderings. However, for finite element problems having unstructured grids or grids generated by a local refinement approach, it is difficult to define many of the orderings for more regular problems. A recently proposed Minimum Discarded Fill (MDF) ordering technique is effective in findingmore » high quality Incomplete LU (ILU) preconditioners, especially for problems arising from unstructured finite element grids. Testing indicates this algorithm can identify a rather complicated physical structure in an anisotropic problem and orders the unknowns in the preferred'' direction. The MDF technique may be viewed as the numerical analogue of the minimum deficiency algorithm in sparse matrix technology. At any stage of the partial elimination, the MDF technique chooses the next pivot node so as to minimize the amount of discarded fill. In this work, two efficient variants of the MDF technique are explored to produce cost-effective high-order ILU preconditioners. The Threshold MDF orderings combine MDF ideas with drop tolerance techniques to identify the sparsity pattern in the ILU preconditioners. These techniques identify an ordering that encourages fast decay of the entries in the ILU factorization. The Minimum Update Matrix (MUM) ordering technique is a simplification of the MDF ordering and is closely related to the minimum degree algorithm. The MUM ordering is especially for large problems arising from Navier-Stokes problems. Some interesting pictures of the orderings are presented using a visualization tool. 22 refs., 4 figs., 7 tabs.« less
Wu, Chi; Xie, Zuowei; Zhang, Guangzhao; Zi, Guofu; Tu, Yingfeng; Yang, Yali; Cai, Ping; Nie, Ting
2002-12-07
A combination of polymer physics and synthetic chemistry has enabled us to develop self-assembly assisted polymerization (SAAP), leading to the preparation of long multi-block copolymers with an ordered chain sequence and controllable block lengths.
Gaussian curvature analysis allows for automatic block placement in multi-block hexahedral meshing.
Ramme, Austin J; Shivanna, Kiran H; Magnotta, Vincent A; Grosland, Nicole M
2011-10-01
Musculoskeletal finite element analysis (FEA) has been essential to research in orthopaedic biomechanics. The generation of a volumetric mesh is often the most challenging step in a FEA. Hexahedral meshing tools that are based on a multi-block approach rely on the manual placement of building blocks for their mesh generation scheme. We hypothesise that Gaussian curvature analysis could be used to automatically develop a building block structure for multi-block hexahedral mesh generation. The Automated Building Block Algorithm incorporates principles from differential geometry, combinatorics, statistical analysis and computer science to automatically generate a building block structure to represent a given surface without prior information. We have applied this algorithm to 29 bones of varying geometries and successfully generated a usable mesh in all cases. This work represents a significant advancement in automating the definition of building blocks.
Homogeneous-oxide stack in IGZO thin-film transistors for multi-level-cell NAND memory application
NASA Astrophysics Data System (ADS)
Ji, Hao; Wei, Yehui; Zhang, Xinlei; Jiang, Ran
2017-11-01
A nonvolatile charge-trap-flash memory that is based on amorphous indium-gallium-zinc-oxide thin film transistors was fabricated with a homogeneous-oxide structure for a multi-level-cell application. All oxide layers, i.e., tunneling layer, charge trapping layer, and blocking layer, were fabricated with Al2O3 films. The fabrication condition (including temperature and deposition method) of the charge trapping layer was different from those of the other oxide layers. This device demonstrated a considerable large memory window of 4 V between the states fully erased and programmed with the operation voltage less than 14 V. This kind of device shows a good prospect for multi-level-cell memory applications.
Dreyfuss, Paul; Henning, Troy; Malladi, Niriksha; Goldstein, Barry; Bogduk, Nikolai
2009-01-01
To determine the physiologic effectiveness of multi-site, multi-depth sacral lateral branch injections. Double-blind, randomized, placebo-controlled study. Outpatient pain management center. Twenty asymptomatic volunteers. The dorsal innervation to the sacroiliac joint (SIJ) is from the L5 dorsal ramus and the S1-3 lateral branches. Multi-site, multi-depth lateral branch blocks were developed to compensate for the complex regional anatomy that limited the effectiveness of single-site, single-depth lateral branch injections. Bilateral multi-site, multi-depth lateral branch green dye injections and subsequent dissection on two cadavers revealed a 91% accuracy with this technique. Session 1: 20 asymptomatic subjects had a 25-g spinal needle probe their interosseous (IO) and dorsal sacroiliac (DSI) ligaments. The inferior dorsal SIJ was entered and capsular distension with contrast medium was performed. Discomfort had to occur with each provocation maneuver and a contained arthrogram was necessary to continue in the study. Session 2: 1 week later; computer randomized, double-blind multi-site, multi-depth lateral branch blocks injections were performed. Ten subjects received active (bupivicaine 0.75%) and 10 subjects received sham (normal saline) multi-site, multi-depth lateral branch injections. Thirty minutes later, provocation testing was repeated with identical methodology used in session 1. Presence or absence of pain for ligamentous probing and SIJ capsular distension. Seventy percent of the active group had an insensate IO and DSI ligaments, and inferior dorsal SIJ vs 0-10% of the sham group. Twenty percent of the active vs 10% of the sham group did not feel repeat capsular distension. Six of seven subjects (86%) retained the ability to feel repeat capsular distension despite an insensate dorsal SIJ complex. Multi-site, multi-depth lateral branch blocks are physiologically effective at a rate of 70%. Multi-site, multi-depth lateral branch blocks do not effectively block the intra-articular portion of the SIJ. There is physiological evidence that the intra-articular portion of the SIJ is innervated from both ventral and dorsal sources. Comparative multi-site, multi-depth lateral branch blocks should be considered a potentially valuable tool to diagnose extra-articular SIJ pain and determine if lateral branch radiofrequency neurotomy may assist one with SIJ pain.
NASA Astrophysics Data System (ADS)
Chen, Guangye; Chacon, Luis
2015-11-01
We discuss a new, conservative, fully implicit 2D3V Vlasov-Darwin particle-in-cell algorithm in curvilinear geometry for non-radiative, electromagnetic kinetic plasma simulations. Unlike standard explicit PIC schemes, fully implicit PIC algorithms are unconditionally stable and allow exact discrete energy and charge conservation. Here, we extend these algorithms to curvilinear geometry. The algorithm retains its exact conservation properties in curvilinear grids. The nonlinear iteration is effectively accelerated with a fluid preconditioner for weakly to modestly magnetized plasmas, which allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. In this presentation, we will introduce the main algorithmic components of the approach, and demonstrate the accuracy and efficiency properties of the algorithm with various numerical experiments in 1D (slow shock) and 2D (island coalescense).
Adaptive Implicit Non-Equilibrium Radiation Diffusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Philip, Bobby; Wang, Zhen; Berrill, Mark A
2013-01-01
We describe methods for accurate and efficient long term time integra- tion of non-equilibrium radiation diffusion systems: implicit time integration for effi- cient long term time integration of stiff multiphysics systems, local control theory based step size control to minimize the required global number of time steps while control- ling accuracy, dynamic 3D adaptive mesh refinement (AMR) to minimize memory and computational costs, Jacobian Free Newton-Krylov methods on AMR grids for efficient nonlinear solution, and optimal multilevel preconditioner components that provide level independent solver convergence.
Iterative methods for mixed finite element equations
NASA Technical Reports Server (NTRS)
Nakazawa, S.; Nagtegaal, J. C.; Zienkiewicz, O. C.
1985-01-01
Iterative strategies for the solution of indefinite system of equations arising from the mixed finite element method are investigated in this paper with application to linear and nonlinear problems in solid and structural mechanics. The augmented Hu-Washizu form is derived, which is then utilized to construct a family of iterative algorithms using the displacement method as the preconditioner. Two types of iterative algorithms are implemented. Those are: constant metric iterations which does not involve the update of preconditioner; variable metric iterations, in which the inverse of the preconditioning matrix is updated. A series of numerical experiments is conducted to evaluate the numerical performance with application to linear and nonlinear model problems.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
NASA Astrophysics Data System (ADS)
Mercier, Sylvain; Gratton, Serge; Tardieu, Nicolas; Vasseur, Xavier
2017-12-01
Many applications in structural mechanics require the numerical solution of sequences of linear systems typically issued from a finite element discretization of the governing equations on fine meshes. The method of Lagrange multipliers is often used to take into account mechanical constraints. The resulting matrices then exhibit a saddle point structure and the iterative solution of such preconditioned linear systems is considered as challenging. A popular strategy is then to combine preconditioning and deflation to yield an efficient method. We propose an alternative that is applicable to the general case and not only to matrices with a saddle point structure. In this approach, we consider to update an existing algebraic or application-based preconditioner, using specific available information exploiting the knowledge of an approximate invariant subspace or of matrix-vector products. The resulting preconditioner has the form of a limited memory quasi-Newton matrix and requires a small number of linearly independent vectors. Numerical experiments performed on three large-scale applications in elasticity highlight the relevance of the new approach. We show that the proposed method outperforms the deflation method when considering sequences of linear systems with varying matrices.
Multi-block sulfonated poly(phenylene) copolymer proton exchange membranes
Fujimoto, Cy H [Albuquerque, NM; Hibbs, Michael [Albuquerque, NM; Ambrosini, Andrea [Albuquerque, NM
2012-02-07
Improved multi-block sulfonated poly(phenylene) copolymer compositions, methods of making the same, and their use as proton exchange membranes (PEM) in hydrogen fuel cells, direct methanol fuel cells, in electrode casting solutions and electrodes. The multi-block architecture has defined, controllable hydrophobic and hydrophilic segments. These improved membranes have better ion transport (proton conductivity) and water swelling properties.
Bi-level Multi-Source Learning for Heterogeneous Block-wise Missing Data
Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified “bi-level” learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. PMID:23988272
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.
1990-01-01
While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
NASA Astrophysics Data System (ADS)
Lu, C.
2017-12-01
This study utilized field outcrops, thin sections, geochemical data, and GR logging curves to investigate the development model of paleokarst within the Longwangmiao Formation in the Lower Cambrian, western Central Yangtze Block, SW China. The Longwangmiao Formation, which belongs to a third-order sequence, consists of four forth-order sequences and is located in the uppermost part of the Lower Cambrian. The vertical variations of the δ13C and δ18O values indicate the existence of multi-stage eogenetic karst events. The eogenetic karst event in the uppermost part of the Longwangmiao Formation is recognized by the dripstones developed within paleocaves, vertical paleoweathering crust with four zones (bedrock, a weak weathering zone, an intense weathering zone and a solution collapsed zone), two generations of calcsparite cement showing bright luminescence and a zonation from nonluminescent to bright to nonluminescent, two types breccias (matrix-rich clast-supported chaotic breccia and matrix-supported chaotic breccia) and rundkarren. The episodic variations of stratiform dissolution vugs and breccias in vertical, and facies-controlled dissolution and filling features indicated the development of multi-stages eogenetic karst. The development of the paleokarst model is controlled by multi-level sea-level changes. The long eccentricity cycle dictates the fluctuations of the forth-order sea-level, generating multi-stage eogenetic karst events. The paleokarst model is an important step towards better understanding the link between the probably orbitally forced sea-level oscillations and eogenetic karst in the Lower Cambrian. According to this paleokarst model, hydrocarbon exploration should focus on both the karst highlands and the karst transitional zone.
Design of convolutional tornado code
NASA Astrophysics Data System (ADS)
Zhou, Hui; Yang, Yao; Gao, Hongmin; Tan, Lu
2017-09-01
As a linear block code, the traditional tornado (tTN) code is inefficient in burst-erasure environment and its multi-level structure may lead to high encoding/decoding complexity. This paper presents a convolutional tornado (cTN) code which is able to improve the burst-erasure protection capability by applying the convolution property to the tTN code, and reduce computational complexity by abrogating the multi-level structure. The simulation results show that cTN code can provide a better packet loss protection performance with lower computation complexity than tTN code.
NASA Astrophysics Data System (ADS)
Liu, Jinjie
2017-08-01
In order to fully consider the impact of future policies and technologies on the electricity sales market, improve the efficiency of electricity market operation, realize the dual goal of power reform and energy saving and emission reduction, this paper uses multi-level decision theory to put forward the double-layer game model under the consideration of ETS and block chain. We set the maximization of electricity sales profit as upper level objective and establish a game strategy model of electricity purchase; while we set maximization of user satisfaction as lower level objective and build a choice behavior model based on customer satisfaction. This paper applies the strategy to the simulation of a sales company's transaction, and makes a horizontal comparison of the same industry competitors as well as a longitudinal comparison of game strategies considering different factors. The results show that Double-layer game model is reasonable and effective, it can significantly improve the efficiency of the electricity sales companies and user satisfaction, while promoting new energy consumption and achieving energy-saving emission reduction.
Standardized Modular Power Interfaces for Future Space Explorations Missions
NASA Technical Reports Server (NTRS)
Oeftering, Richard
2015-01-01
Earlier studies show that future human explorations missions are composed of multi-vehicle assemblies with interconnected electric power systems. Some vehicles are often intended to serve as flexible multi-purpose or multi-mission platforms. This drives the need for power architectures that can be reconfigured to support this level of flexibility. Power system developmental costs can be reduced, program wide, by utilizing a common set of modular building blocks. Further, there are mission operational and logistics cost benefits of using a common set of modular spares. These benefits are the goals of the Advanced Exploration Systems (AES) Modular Power System (AMPS) project. A common set of modular blocks requires a substantial level of standardization in terms of the Electrical, Data System, and Mechanical interfaces. The AMPS project is developing a set of proposed interface standards that will provide useful guidance for modular hardware developers but not needlessly constrain technology options, or limit future growth in capability. In 2015 the AMPS project focused on standardizing the interfaces between the elements of spacecraft power distribution and energy storage. The development of the modular power standard starts with establishing mission assumptions and ground rules to define design application space. The standards are defined in terms of AMPS objectives including Commonality, Reliability-Availability, Flexibility-Configurability and Supportability-Reusability. The proposed standards are aimed at assembly and sub-assembly level building blocks. AMPS plans to adopt existing standards for spacecraft command and data, software, network interfaces, and electrical power interfaces where applicable. Other standards including structural encapsulation, heat transfer, and fluid transfer, are governed by launch and spacecraft environments and bound by practical limitations of weight and volume. Developing these mechanical interface standards is more difficult but an essential part of defining physical building blocks of modular power. This presentation describes the AMPS projects progress towards standardized modular power interfaces.
Multi-Bit Quantum Private Query
NASA Astrophysics Data System (ADS)
Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing
2015-09-01
Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh
NASA Astrophysics Data System (ADS)
Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru
2017-11-01
We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
Fan, Wei; Shi, Wen; Zhang, Wenting; Jia, Yinnong; Zhou, Zhengyuan; Brusnahan, Susan K; Garrison, Jered C
2016-10-01
This work continues our efforts to improve the diagnostic and radiotherapeutic effectiveness of nanomedicine platforms by developing approaches to reduce the non-target accumulation of these agents. Herein, we developed multi-block HPMA copolymers with backbones that are susceptible to cleavage by cathepsin S, a protease that is abundantly expressed in tissues of the mononuclear phagocyte system (MPS). Specifically, a bis-thiol terminated HPMA telechelic copolymer containing 1,4,7,10-tetraazacyclododecane-1,4,7,10-tetraacetic acid (DOTA) was synthesized by reversible addition-fragmentation chain transfer (RAFT) polymerization. Three maleimide modified linkers with different sequences, including cathepsin S degradable oligopeptide, scramble oligopeptide and oligo ethylene glycol, were subsequently synthesized and used for the extension of the HPMA copolymers by thiol-maleimide click chemistry. All multi-block HPMA copolymers could be labeled by (177)Lu with high labeling efficiency and exhibited high serum stability. In vitro cleavage studies demonstrated highly selective and efficient cathepsin S mediated cleavage of the cathepsin S-susceptible multi-block HPMA copolymer. A modified multi-block HPMA copolymer series capable of Förster Resonance Energy Transfer (FRET) was utilized to investigate the rate of cleavage of the multi-block HPMA copolymers in monocyte-derived macrophages. Confocal imaging and flow cytometry studies revealed substantially higher rates of cleavage for the multi-block HPMA copolymers containing the cathepsin S-susceptible linker. The efficacy of the cathepsin S-cleavable multi-block HPMA copolymer was further examined using an in vivo model of pancreatic ductal adenocarcinoma. Based on the biodistribution and SPECT/CT studies, the copolymer extended with the cathepsin S susceptible linker exhibited significantly faster clearance and lower non-target retention without compromising tumor targeting. Overall, these results indicate that exploitation of the cathepsin S activity in MPS tissues can be utilized to substantially lower non-target accumulation, suggesting this is a promising approach for the development of diagnostic and radiotherapeutic nanomedicine platforms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nano-structured polymer composites and process for preparing same
Hillmyer, Marc; Chen, Liang
2013-04-16
A process for preparing a polymer composite that includes reacting (a) a multi-functional monomer and (b) a block copolymer comprising (i) a first block and (ii) a second block that includes a functional group capable of reacting with the multi-functional monomer, to form a crosslinked, nano-structured, bi-continuous composite. The composite includes a continuous matrix phase and a second continuous phase comprising the first block of the block copolymer.
NASA Astrophysics Data System (ADS)
Kim, S. C.; Hayter, E. J.; Pruhs, R.; Luong, P.; Lackey, T. C.
2016-12-01
The geophysical scale circulation of the Mid Atlantic Bight and hydrologic inputs from adjacent Chesapeake Bay watersheds and tributaries influences the hydrodynamics and transport of the James River estuary. Both barotropic and baroclinic transport govern the hydrodynamics of this partially stratified estuary. Modeling the placement of dredged sediment requires accommodating this wide spectrum of atmospheric and hydrodynamic scales. The Geophysical Scale Multi-Block (GSMB) Transport Modeling System is a collection of multiple well established and USACE approved process models. Taking advantage of the parallel computing capability of multi-block modeling, we performed one year three-dimensional modeling of hydrodynamics in supporting simulation of dredged sediment placements transport and morphology changes. Model forcing includes spatially and temporally varying meteorological conditions and hydrological inputs from the watershed. Surface heat flux estimates were derived from the National Solar Radiation Database (NSRDB). The open water boundary condition for water level was obtained from an ADCIRC model application of the U. S. East Coast. Temperature-salinity boundary conditions were obtained from the Environmental Protection Agency (EPA) Chesapeake Bay Program (CBP) long-term monitoring stations database. Simulated water levels were calibrated and verified by comparison with National Oceanic and Atmospheric Administration (NOAA) tide gage locations. A harmonic analysis of the modeled tides was performed and compared with NOAA tide prediction data. In addition, project specific circulation was verified using US Army Corps of Engineers (USACE) drogue data. Salinity and temperature transport was verified at seven CBP long term monitoring stations along the navigation channel. Simulation and analysis of model results suggest that GSMB is capable of resolving the long duration, multi-scale processes inherent to practical engineering problems such as dredged material placement stability.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
Grayver, Alexander V.; Kolev, Tzanio V.
2015-11-01
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayver, Alexander V.; Kolev, Tzanio V.
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Gumerov, Nail A; Duraiswami, Ramani
2009-01-01
The development of a fast multipole method (FMM) accelerated iterative solution of the boundary element method (BEM) for the Helmholtz equations in three dimensions is described. The FMM for the Helmholtz equation is significantly different for problems with low and high kD (where k is the wavenumber and D the domain size), and for large problems the method must be switched between levels of the hierarchy. The BEM requires several approximate computations (numerical quadrature, approximations of the boundary shapes using elements), and these errors must be balanced against approximations introduced by the FMM and the convergence criterion for iterative solution. These different errors must all be chosen in a way that, on the one hand, excess work is not done and, on the other, that the error achieved by the overall computation is acceptable. Details of translation operators for low and high kD, choice of representations, and BEM quadrature schemes, all consistent with these approximations, are described. A novel preconditioner using a low accuracy FMM accelerated solver as a right preconditioner is also described. Results of the developed solvers for large boundary value problems with 0.0001 less, similarkD less, similar500 are presented and shown to perform close to theoretical expectations.
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH
NASA Astrophysics Data System (ADS)
Lee, D.; Gopal, S.; Mohapatra, P.
2012-07-01
We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Solving groundwater flow problems by conjugate-gradient methods and the strongly implicit procedure
Hill, Mary C.
1990-01-01
The performance of the preconditioned conjugate-gradient method with three preconditioners is compared with the strongly implicit procedure (SIP) using a scalar computer. The preconditioners considered are the incomplete Cholesky (ICCG) and the modified incomplete Cholesky (MICCG), which require the same computer storage as SIP as programmed for a problem with a symmetric matrix, and a polynomial preconditioner (POLCG), which requires less computer storage than SIP. Although POLCG is usually used on vector computers, it is included here because of its small storage requirements. In this paper, published comparisons of the solvers are evaluated, all four solvers are compared for the first time, and new test cases are presented to provide a more complete basis by which the solvers can be judged for typical groundwater flow problems. Based on nine test cases, the following conclusions are reached: (1) SIP is actually as efficient as ICCG for some of the published, linear, two-dimensional test cases that were reportedly solved much more efficiently by ICCG; (2) SIP is more efficient than other published comparisons would indicate when common convergence criteria are used; and (3) for problems that are three-dimensional, nonlinear, or both, and for which common convergence criteria are used, SIP is often more efficient than ICCG, and is sometimes more efficient than MICCG.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
NASA Astrophysics Data System (ADS)
Chen, Guangye; Chacón, Luis; CoCoMans Team
2014-10-01
For decades, the Vlasov-Darwin model has been recognized to be attractive for PIC simulations (to avoid radiative noise issues) in non-radiative electromagnetic regimes. However, the Darwin model results in elliptic field equations that renders explicit time integration unconditionally unstable. Improving on linearly implicit schemes, fully implicit PIC algorithms for both electrostatic and electromagnetic regimes, with exact discrete energy and charge conservation properties, have been recently developed in 1D. This study builds on these recent algorithms to develop an implicit, orbit-averaged, time-space-centered finite difference scheme for the particle-field equations in multiple dimensions. The algorithm conserves energy, charge, and canonical-momentum exactly, even with grid packing. A simple fluid preconditioner allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. We demonstrate the accuracy and efficiency properties of the of the algorithm with various numerical experiments in 2D3V.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, G., E-mail: gchen@lanl.gov; Chacón, L.; Leibs, C.A.
2014-02-01
A recent proof-of-principle study proposes an energy- and charge-conserving, nonlinearly implicit electrostatic particle-in-cell (PIC) algorithm in one dimension [9]. The algorithm in the reference employs an unpreconditioned Jacobian-free Newton–Krylov method, which ensures nonlinear convergence at every timestep (resolving the dynamical timescale of interest). Kinetic enslavement, which is one key component of the algorithm, not only enables fully implicit PIC as a practical approach, but also allows preconditioning the kinetic solver with a fluid approximation. This study proposes such a preconditioner, in which the linearized moment equations are closed with moments computed from particles. Effective acceleration of the linear GMRES solvemore » is demonstrated, on both uniform and non-uniform meshes. The algorithm performance is largely insensitive to the electron–ion mass ratio. Numerical experiments are performed on a 1D multi-scale ion acoustic wave test problem.« less
Numerical Upscaling of Solute Transport in Fractured Porous Media Based on Flow Aligned Blocks
NASA Astrophysics Data System (ADS)
Leube, P.; Nowak, W.; Sanchez-Vila, X.
2013-12-01
High-contrast or fractured-porous media (FPM) pose one of the largest unresolved challenges for simulating large hydrogeological systems. The high contrast in advective transport between fast conduits and low-permeability rock matrix, including complex mass transfer processes, leads to the typical complex characteristics of early bulk arrivals and long tailings. Adequate direct representation of FPM requires enormous numerical resolutions. For large scales, e.g. the catchment scale, and when allowing for uncertainty in the fracture network architecture or in matrix properties, computational costs quickly reach an intractable level. In such cases, multi-scale simulation techniques have become useful tools. They allow decreasing the complexity of models by aggregating and transferring their parameters to coarser scales and so drastically reduce the computational costs. However, these advantages come at a loss of detail and accuracy. In this work, we develop and test a new multi-scale or upscaled modeling approach based on block upscaling. The novelty is that individual blocks are defined by and aligned with the local flow coordinates. We choose a multi-rate mass transfer (MRMT) model to represent the remaining sub-block non-Fickian behavior within these blocks on the coarse scale. To make the scale transition simple and to save computational costs, we capture sub-block features by temporal moments (TM) of block-wise particle arrival times to be matched with the MRMT model. By predicting spatial mass distributions of injected tracers in a synthetic test scenario, our coarse-scale solution matches reasonably well with the corresponding fine-scale reference solution. For predicting higher TM-orders (such as arrival time and effective dispersion), the prediction accuracy steadily decreases. This is compensated to some extent by the MRMT model. If the MRMT model becomes too complex, it loses its effect. We also found that prediction accuracy is sensitive to the choice of the effective dispersion coefficients and on the block resolution. A key advantage of the flow-aligned blocks is that the small-scale velocity field is reproduced quite accurately on the block-scale through their flow alignment. Thus, the block-scale transverse dispersivities remain in the similar magnitude as local ones, and they do not have to represent macroscopic uncertainty. Also, the flow-aligned blocks minimize numerical dispersion when solving the large-scale transport problem.
Sanbonmatsu, David M; Strayer, David L; Medeiros-Ward, Nathan; Watson, Jason M
2013-01-01
The present study examined the relationship between personality and individual differences in multi-tasking ability. Participants enrolled at the University of Utah completed measures of multi-tasking activity, perceived multi-tasking ability, impulsivity, and sensation seeking. In addition, they performed the Operation Span in order to assess their executive control and actual multi-tasking ability. The findings indicate that the persons who are most capable of multi-tasking effectively are not the persons who are most likely to engage in multiple tasks simultaneously. To the contrary, multi-tasking activity as measured by the Media Multitasking Inventory and self-reported cell phone usage while driving were negatively correlated with actual multi-tasking ability. Multi-tasking was positively correlated with participants' perceived ability to multi-task ability which was found to be significantly inflated. Participants with a strong approach orientation and a weak avoidance orientation--high levels of impulsivity and sensation seeking--reported greater multi-tasking behavior. Finally, the findings suggest that people often engage in multi-tasking because they are less able to block out distractions and focus on a singular task. Participants with less executive control--low scorers on the Operation Span task and persons high in impulsivity--tended to report higher levels of multi-tasking activity.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
NASA Astrophysics Data System (ADS)
Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei
We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
An interactive multi-block grid generation system
NASA Technical Reports Server (NTRS)
Kao, T. J.; Su, T. Y.; Appleby, Ruth
1992-01-01
A grid generation procedure combining interactive and batch grid generation programs was put together to generate multi-block grids for complex aircraft configurations. The interactive section provides the tools for 3D geometry manipulation, surface grid extraction, boundary domain construction for 3D volume grid generation, and block-block relationships and boundary conditions for flow solvers. The procedure improves the flexibility and quality of grid generation to meet the design/analysis requirements.
Software Defined Radio with Parallelized Software Architecture
NASA Technical Reports Server (NTRS)
Heckler, Greg
2013-01-01
This software implements software-defined radio procession over multi-core, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to .50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.
NASA Astrophysics Data System (ADS)
Chui, Siu Lit; Lu, Ya Yan
2004-03-01
Wide-angle full-vector beam propagation methods (BPMs) for three-dimensional wave-guiding structures can be derived on the basis of rational approximants of a square root operator or its exponential (i.e., the one-way propagator). While the less accurate BPM based on the slowly varying envelope approximation can be efficiently solved by the alternating direction implicit (ADI) method, the wide-angle variants involve linear systems that are more difficult to handle. We present an efficient solver for these linear systems that is based on a Krylov subspace method with an ADI preconditioner. The resulting wide-angle full-vector BPM is used to simulate the propagation of wave fields in a Y branch and a taper.
Chui, Siu Lit; Lu, Ya Yan
2004-03-01
Wide-angle full-vector beam propagation methods (BPMs) for three-dimensional wave-guiding structures can be derived on the basis of rational approximants of a square root operator or its exponential (i.e., the one-way propagator). While the less accurate BPM based on the slowly varying envelope approximation can be efficiently solved by the alternating direction implicit (ADI) method, the wide-angle variants involve linear systems that are more difficult to handle. We present an efficient solver for these linear systems that is based on a Krylov subspace method with an ADI preconditioner. The resulting wide-angle full-vector BPM is used to simulate the propagation of wave fields in a Y branch and a taper.
Hydrodynamics of suspensions of passive and active rigid particles: a rigid multiblob approach
Usabiaga, Florencio Balboa; Kallemov, Bakytzhan; Delmotte, Blaise; ...
2016-01-12
We develop a rigid multiblob method for numerically solving the mobility problem for suspensions of passive and active rigid particles of complex shape in Stokes flow in unconfined, partially confined, and fully confined geometries. As in a number of existing methods, we discretize rigid bodies using a collection of minimally resolved spherical blobs constrained to move as a rigid body, to arrive at a potentially large linear system of equations for the unknown Lagrange multipliers and rigid-body motions. Here we develop a block-diagonal preconditioner for this linear system and show that a standard Krylov solver converges in a modest numbermore » of iterations that is essentially independent of the number of particles. Key to the efficiency of the method is a technique for fast computation of the product of the blob-blob mobility matrix and a vector. For unbounded suspensions, we rely on existing analytical expressions for the Rotne-Prager-Yamakawa tensor combined with a fast multipole method (FMM) to obtain linear scaling in the number of particles. For suspensions sedimented against a single no-slip boundary, we use a direct summation on a graphical processing unit (GPU), which gives quadratic asymptotic scaling with the number of particles. For fully confined domains, such as periodic suspensions or suspensions confined in slit and square channels, we extend a recently developed rigid-body immersed boundary method by B. Kallemov, A. P. S. Bhalla, B. E. Griffith, and A. Donev (Commun. Appl. Math. Comput. Sci. 11 (2016), no. 1, 79-141) to suspensions of freely moving passive or active rigid particles at zero Reynolds number. We demonstrate that the iterative solver for the coupled fluid and rigid-body equations converges in a bounded number of iterations regardless of the system size. In our approach, each iteration only requires a few cycles of a geometric multigrid solver for the Poisson equation, and an application of the block-diagonal preconditioner, leading to linear scaling with the number of particles. We optimize a number of parameters in the iterative solvers and apply our method to a variety of benchmark problems to carefully assess the accuracy of the rigid multiblob approach as a function of the resolution. We also model the dynamics of colloidal particles studied in recent experiments, such as passive boomerangs in a slit channel, as well as a pair of non-Brownian active nanorods sedimented against a wall.« less
On codes with multi-level error-correction capabilities
NASA Technical Reports Server (NTRS)
Lin, Shu
1987-01-01
In conventional coding for error control, all the information symbols of a message are regarded equally significant, and hence codes are devised to provide equal protection for each information symbol against channel errors. However, in some occasions, some information symbols in a message are more significant than the other symbols. As a result, it is desired to devise codes with multilevel error-correcting capabilities. Another situation where codes with multi-level error-correcting capabilities are desired is in broadcast communication systems. An m-user broadcast channel has one input and m outputs. The single input and each output form a component channel. The component channels may have different noise levels, and hence the messages transmitted over the component channels require different levels of protection against errors. Block codes with multi-level error-correcting capabilities are also known as unequal error protection (UEP) codes. Structural properties of these codes are derived. Based on these structural properties, two classes of UEP codes are constructed.
NASA Astrophysics Data System (ADS)
Huang, Yan; Wang, Zhihui
2015-12-01
With the development of FPGA, DSP Builder is widely applied to design system-level algorithms. The algorithm of CL multi-wavelet is more advanced and effective than scalar wavelets in processing signal decomposition. Thus, a system of CL multi-wavelet based on DSP Builder is designed for the first time in this paper. The system mainly contains three parts: a pre-filtering subsystem, a one-level decomposition subsystem and a two-level decomposition subsystem. It can be converted into hardware language VHDL by the Signal Complier block that can be used in Quartus II. After analyzing the energy indicator, it shows that this system outperforms Daubenchies wavelet in signal decomposition. Furthermore, it has proved to be suitable for the implementation of signal fusion based on SoPC hardware, and it will become a solid foundation in this new field.
Preconditioned conjugate gradient methods for the Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1994-01-01
A preconditioned Krylov subspace method (GMRES) is used to solve the linear systems of equations formed at each time-integration step of the unsteady, two-dimensional, compressible Navier-Stokes equations of fluid flow. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux-split formulation. Several preconditioning techniques are investigated to enhance the efficiency and convergence rate of the implicit solver based on the GMRES algorithm. The superiority of the new solver is established by comparisons with a conventional implicit solver, namely line Gauss-Seidel relaxation (LGSR). Computational test results for low-speed (incompressible flow over a backward-facing step at Mach 0.1), transonic flow (trailing edge flow in a transonic turbine cascade), and hypersonic flow (shock-on-shock interactions on a cylindrical leading edge at Mach 6.0) are presented. For the Mach 0.1 case, overall speedup factors of up to 17 (in terms of time-steps) and 15 (in terms of CPU time on a CRAY-YMP/8) are found in favor of the preconditioned GMRES solver, when compared with the LGSR solver. The corresponding speedup factors for the transonic flow case are 17 and 23, respectively. The hypersonic flow case shows slightly lower speedup factors of 9 and 13, respectively. The study of preconditioners conducted in this research reveals that a new LUSGS-type preconditioner is much more efficient than a conventional incomplete LU-type preconditioner.
An adaptive block-based fusion method with LUE-SSIM for multi-focus images
NASA Astrophysics Data System (ADS)
Zheng, Jianing; Guo, Yongcai; Huang, Yukun
2016-09-01
Because of the lenses' limited depth of field, digital cameras are incapable of acquiring an all-in-focus image of objects at varying distances in a scene. Multi-focus image fusion technique can effectively solve this problem. Aiming at the block-based multi-focus image fusion methods, the problem that blocking-artifacts often occurs. An Adaptive block-based fusion method based on lifting undistorted-edge structural similarity (LUE-SSIM) is put forward. In this method, image quality metrics LUE-SSIM is firstly proposed, which utilizes the characteristics of human visual system (HVS) and structural similarity (SSIM) to make the metrics consistent with the human visual perception. Particle swarm optimization(PSO) algorithm which selects LUE-SSIM as the object function is used for optimizing the block size to construct the fused image. Experimental results on LIVE image database shows that LUE-SSIM outperform SSIM on Gaussian defocus blur images quality assessment. Besides, multi-focus image fusion experiment is carried out to verify our proposed image fusion method in terms of visual and quantitative evaluation. The results show that the proposed method performs better than some other block-based methods, especially in reducing the blocking-artifact of the fused image. And our method can effectively preserve the undistorted-edge details in focus region of the source images.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
Li, Ruipeng; Saad, Yousef
2017-08-01
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
The preconditioned Gauss-Seidel method faster than the SOR method
NASA Astrophysics Data System (ADS)
Niki, Hiroshi; Kohno, Toshiyuki; Morimoto, Munenori
2008-09-01
In recent years, a number of preconditioners have been applied to linear systems [A.D. Gunawardena, S.K. Jain, L. Snyder, Modified iterative methods for consistent linear systems, Linear Algebra Appl. 154-156 (1991) 123-143; T. Kohno, H. Kotakemori, H. Niki, M. Usui, Improving modified Gauss-Seidel method for Z-matrices, Linear Algebra Appl. 267 (1997) 113-123; H. Kotakemori, K. Harada, M. Morimoto, H. Niki, A comparison theorem for the iterative method with the preconditioner (I+Smax), J. Comput. Appl. Math. 145 (2002) 373-378; H. Kotakemori, H. Niki, N. Okamoto, Accelerated iteration method for Z-matrices, J. Comput. Appl. Math. 75 (1996) 87-97; M. Usui, H. Niki, T.Kohno, Adaptive Gauss-Seidel method for linear systems, Internat. J. Comput. Math. 51(1994)119-125 [10
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ruipeng; Saad, Yousef
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Design of a 0.13-μm CMOS cascade expandable ΣΔ modulator for multi-standard RF telecom systems
NASA Astrophysics Data System (ADS)
Morgado, Alonso; del Río, Rocío; de la Rosa, José M.
2007-05-01
This paper reports a 130-nm CMOS programmable cascade ΣΔ modulator for multi-standard wireless terminals, capable of operating on three standards: GSM, Bluetooth and UMTS. The modulator is reconfigured at both architecture- and circuit- level in order to adapt its performance to the different standards specifications with optimized power consumption. The design of the building blocks is based upon a top-down CAD methodology that combines simulation and statistical optimization at different levels of the system hierarchy. Transistor-level simulations show correct operation for all standards, featuring 13-bit, 11.3-bit and 9-bit effective resolution within 200-kHz, 1-MHz and 4-MHz bandwidth, respectively.
2012-10-01
REPORT 3. DATES COVERED (From - To) MAR 2010 – APR 2012 4 . TITLE AND SUBTITLE IMPLICATIONS OF MULT-CORE ARCHITECTURES ON THE DEVELOPMENT OF...Framework for Multicore Information Flow Analysis ...................................... 23 4 4.1 A Hypothetical Reference Architecture... 4 Figure 2: Pentium II Block Diagram
Coarse mesh and one-cell block inversion based diffusion synthetic acceleration
NASA Astrophysics Data System (ADS)
Kim, Kang-Seog
DSA (Diffusion Synthetic Acceleration) has been developed to accelerate the SN transport iteration. We have developed solution techniques for the diffusion equations of FLBLD (Fully Lumped Bilinear Discontinuous), SCB (Simple Comer Balance) and UCB (Upstream Corner Balance) modified 4-step DSA in x-y geometry. Our first multi-level method includes a block Gauss-Seidel iteration for the discontinuous diffusion equation, uses the continuous diffusion equation derived from the asymptotic analysis, and avoids void cell calculation. We implemented this multi-level procedure and performed model problem calculations. The results showed that the FLBLD, SCB and UCB modified 4-step DSA schemes with this multi-level technique are unconditionally stable and rapidly convergent. We suggested a simplified multi-level technique for FLBLD, SCB and UCB modified 4-step DSA. This new procedure does not include iterations on the diffusion calculation or the residual calculation. Fourier analysis results showed that this new procedure was as rapidly convergent as conventional modified 4-step DSA. We developed new DSA procedures coupled with 1-CI (Cell Block Inversion) transport which can be easily parallelized. We showed that 1-CI based DSA schemes preceded by SI (Source Iteration) are efficient and rapidly convergent for LD (Linear Discontinuous) and LLD (Lumped Linear Discontinuous) in slab geometry and for BLD (Bilinear Discontinuous) and FLBLD in x-y geometry. For 1-CI based DSA without SI in slab geometry, the results showed that this procedure is very efficient and effective for all cases. We also showed that 1-CI based DSA in x-y geometry was not effective for thin mesh spacings, but is effective and rapidly convergent for intermediate and thick mesh spacings. We demonstrated that the diffusion equation discretized on a coarse mesh could be employed to accelerate the transport equation. Our results showed that coarse mesh DSA is unconditionally stable and is as rapidly convergent as fine mesh DSA in slab geometry. For x-y geometry our coarse mesh DSA is very effective for thin and intermediate mesh spacings independent of the scattering ratio, but is not effective for purely scattering problems and high aspect ratio zoning. However, if the scattering ratio is less than about 0.95, this procedure is very effective for all mesh spacing.
Present-day crustal deformation and strain transfer in northeastern Tibetan Plateau
NASA Astrophysics Data System (ADS)
Li, Yuhang; Liu, Mian; Wang, Qingliang; Cui, Duxin
2018-04-01
The three-dimensional present-day crustal deformation and strain partitioning in northeastern Tibetan Plateau are analyzed using available GPS and precise leveling data. We used the multi-scale wavelet method to analyze strain rates, and the elastic block model to estimate slip rates on the major faults and internal strain within each block. Our results show that shear strain is strongly localized along major strike-slip faults, as expected in the tectonic extrusion model. However, extrusion ends and transfers to crustal contraction near the eastern margin of the Tibetan Plateau. The strain transfer is abrupt along the Haiyuan Fault and diffusive along the East Kunlun Fault. Crustal contraction is spatially correlated with active uplifting. The present-day strain is concentrated along major fault zones; however, within many terranes bounded by these faults, intra-block strain is detectable. Terranes having high intra-block strain rates also show strong seismicity. On average the Ordos and Sichuan blocks show no intra-block strain, but localized strain on the southwestern corner of the Ordos block indicates tectonic encroachment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heroux, Michael Allen
2004-07-01
The Trilinos{trademark} Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. AztecOO{trademark} is a package within Trilinos that enables the use of the Aztec solver library [19] with Epetra{trademark} [13] objects. AztecOO provides access to Aztec preconditioners and solvers by implementing the Aztec 'matrix-free' interface using Epetra. While Aztec is written in C and procedure-oriented, AztecOO is written in C++ and is object-oriented. In addition to providing access to Aztec capabilities, AztecOO also provides some signficant new functionality. In particular it provides an extensible status testing capability that allows expression of sophisticatedmore » stopping criteria as is needed in production use of iterative solvers. AztecOO also provides mechanisms for using Ifpack [2], ML [20] and AztecOO itself as preconditioners.« less
Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Widlund, Olof B.
2015-06-09
The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
On adaptive weighted polynomial preconditioning for Hermitian positive definite matrices
NASA Technical Reports Server (NTRS)
Fischer, Bernd; Freund, Roland W.
1992-01-01
The conjugate gradient algorithm for solving Hermitian positive definite linear systems is usually combined with preconditioning in order to speed up convergence. In recent years, there has been a revival of polynomial preconditioning, motivated by the attractive features of the method on modern architectures. Standard techniques for choosing the preconditioning polynomial are based only on bounds for the extreme eigenvalues. Here a different approach is proposed, which aims at adapting the preconditioner to the eigenvalue distribution of the coefficient matrix. The technique is based on the observation that good estimates for the eigenvalue distribution can be derived after only a few steps of the Lanczos process. This information is then used to construct a weight function for a suitable Chebyshev approximation problem. The solution of this problem yields the polynomial preconditioner. In particular, we investigate the use of Bernstein-Szego weights.
NASA Astrophysics Data System (ADS)
Zhou, Yuzhi; Wang, Han; Liu, Yu; Gao, Xingyu; Song, Haifeng
2018-03-01
The Kerker preconditioner, based on the dielectric function of homogeneous electron gas, is designed to accelerate the self-consistent field (SCF) iteration in the density functional theory calculations. However, a question still remains regarding its applicability to the inhomogeneous systems. We develop a modified Kerker preconditioning scheme which captures the long-range screening behavior of inhomogeneous systems and thus improves the SCF convergence. The effectiveness and efficiency is shown by the tests on long-z slabs of metals, insulators, and metal-insulator contacts. For situations without a priori knowledge of the system, we design the a posteriori indicator to monitor if the preconditioner has suppressed charge sloshing during the iterations. Based on the a posteriori indicator, we demonstrate two schemes of the self-adaptive configuration for the SCF iteration.
Optimal preconditioning of lattice Boltzmann methods
NASA Astrophysics Data System (ADS)
Izquierdo, Salvador; Fueyo, Norberto
2009-09-01
A preconditioning technique to accelerate the simulation of steady-state problems using the single-relaxation-time (SRT) lattice Boltzmann (LB) method was first proposed by Guo et al. [Z. Guo, T. Zhao, Y. Shi, Preconditioned lattice-Boltzmann method for steady flows, Phys. Rev. E 70 (2004) 066706-1]. The key idea in this preconditioner is to modify the equilibrium distribution function in such a way that, by means of a Chapman-Enskog expansion, a time-derivative preconditioner of the Navier-Stokes (NS) equations is obtained. In the present contribution, the optimal values for the free parameter γ of this preconditioner are searched both numerically and theoretically; the later with the aid of linear-stability analysis and with the condition number of the system of NS equations. The influence of the collision operator, single- versus multiple-relaxation-times (MRT), is also studied. Three steady-state laminar test cases are used for validation, namely: the two-dimensional lid-driven cavity, a two-dimensional microchannel and the three-dimensional backward-facing step. Finally, guidelines are suggested for an a priori definition of optimal preconditioning parameters as a function of the Reynolds and Mach numbers. The new optimally preconditioned MRT method derived is shown to improve, simultaneously, the rate of convergence, the stability and the accuracy of the lattice Boltzmann simulations, when compared to the non-preconditioned methods and to the optimally preconditioned SRT one. Additionally, direct time-derivative preconditioning of the LB equation is also studied.
Preconditioned conjugate-gradient methods for low-speed flow calculations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1993-01-01
An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations is integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the Lower-Upper Successive Symmetric Over-Relaxation iterative scheme is more efficient than a preconditioner based on Incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional Line Gauss-Seidel Relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
Preconditioned Conjugate Gradient methods for low speed flow calculations
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing
1993-01-01
An investigation is conducted into the viability of using a generalized Conjugate Gradient-like method as an iterative solver to obtain steady-state solutions of very low-speed fluid flow problems. Low-speed flow at Mach 0.1 over a backward-facing step is chosen as a representative test problem. The unsteady form of the two dimensional, compressible Navier-Stokes equations are integrated in time using discrete time-steps. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux split formulation. The new iterative solver is used to solve a linear system of equations at each step of the time-integration. Preconditioning techniques are used with the new solver to enhance the stability and the convergence rate of the solver and are found to be critical to the overall success of the solver. A study of various preconditioners reveals that a preconditioner based on the lower-upper (L-U)-successive symmetric over-relaxation iterative scheme is more efficient than a preconditioner based on incomplete L-U factorizations of the iteration matrix. The performance of the new preconditioned solver is compared with a conventional line Gauss-Seidel relaxation (LGSR) solver. Overall speed-up factors of 28 (in terms of global time-steps required to converge to a steady-state solution) and 20 (in terms of total CPU time on one processor of a CRAY-YMP) are found in favor of the new preconditioned solver, when compared with the LGSR solver.
NASA Astrophysics Data System (ADS)
Huang, Xiaomeng; Tang, Qiang; Tseng, Yuheng; Hu, Yong; Baker, Allison H.; Bryan, Frank O.; Dennis, John; Fu, Haohuan; Yang, Guangwen
2016-11-01
In the Community Earth System Model (CESM), the ocean model is computationally expensive for high-resolution grids and is often the least scalable component for high-resolution production experiments. The major bottleneck is that the barotropic solver scales poorly at high core counts. We design a new barotropic solver to accelerate the high-resolution ocean simulation. The novel solver adopts a Chebyshev-type iterative method to reduce the global communication cost in conjunction with an effective block preconditioner to further reduce the iterations. The algorithm and its computational complexity are theoretically analyzed and compared with other existing methods. We confirm the significant reduction of the global communication time with a competitive convergence rate using a series of idealized tests. Numerical experiments using the CESM 0.1° global ocean model show that the proposed approach results in a factor of 1.7 speed-up over the original method with no loss of accuracy, achieving 10.5 simulated years per wall-clock day on 16 875 cores.
Parallel iterative solution for h and p approximations of the shallow water equations
Barragy, E.J.; Walters, R.A.
1998-01-01
A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carleton, James Brian; Parks, Michael L.
Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Regularized Generalized Canonical Correlation Analysis
ERIC Educational Resources Information Center
Tenenhaus, Arthur; Tenenhaus, Michel
2011-01-01
Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables. It constitutes a general framework for many multi-block data analysis methods. It combines the power of multi-block data analysis methods (maximization of well identified criteria) and…
A Hierarchical Model for Simultaneous Detection and Estimation in Multi-subject fMRI Studies
Degras, David; Lindquist, Martin A.
2014-01-01
In this paper we introduce a new hierarchical model for the simultaneous detection of brain activation and estimation of the shape of the hemodynamic response in multi-subject fMRI studies. The proposed approach circumvents a major stumbling block in standard multi-subject fMRI data analysis, in that it both allows the shape of the hemodynamic response function to vary across region and subjects, while still providing a straightforward way to estimate population-level activation. An e cient estimation algorithm is presented, as is an inferential framework that not only allows for tests of activation, but also for tests for deviations from some canonical shape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain. PMID:24793829
Intra-class correlation estimates for assessment of vitamin A intake in children.
Agarwal, Girdhar G; Awasthi, Shally; Walter, Stephen D
2005-03-01
In many community-based surveys, multi-level sampling is inherent in the design. In the design of these studies, especially to calculate the appropriate sample size, investigators need good estimates of intra-class correlation coefficient (ICC), along with the cluster size, to adjust for variation inflation due to clustering at each level. The present study used data on the assessment of clinical vitamin A deficiency and intake of vitamin A-rich food in children in a district in India. For the survey, 16 households were sampled from 200 villages nested within eight randomly-selected blocks of the district. ICCs and components of variances were estimated from a three-level hierarchical random effects analysis of variance model. Estimates of ICCs and variance components were obtained at village and block levels. Between-cluster variation was evident at each level of clustering. In these estimates, ICCs were inversely related to cluster size, but the design effect could be substantial for large clusters. At the block level, most ICC estimates were below 0.07. At the village level, many ICC estimates ranged from 0.014 to 0.45. These estimates may provide useful information for the design of epidemiological studies in which the sampled (or allocated) units range in size from households to large administrative zones.
Research in parallel computing
NASA Technical Reports Server (NTRS)
Ortega, James M.; Henderson, Charles
1994-01-01
This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Software Defined Radio with Parallelized Software Architecture
NASA Technical Reports Server (NTRS)
Heckler, Greg
2013-01-01
This software implements software-defined radio procession over multicore, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to approx.50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Zhang, Zheng; Wu, Yuyang; Yu, Feng; Niu, Chaoqun; Du, Zhi; Chen, Yong; Du, Jie
2017-10-01
The construction and self-assembly of DNA building blocks are the foundation of bottom-up development of three-dimensional DNA nanostructures or hydrogels. However, most self-assembly from DNA components is impeded by the mishybridized intermediates or the thermodynamic instability. To enable rapid production of complicated DNA objects with high yields no need for annealing process, herein different DNA building blocks (Y-shaped, L- and L'-shaped units) were assembled in presence of a cationic comb-type copolymer, poly (L-lysine)-graft-dextran (PLL-g-Dex), under physiological conditions. The results demonstrated that PLL-g-Dex not only significantly promoted the self-assembly of DNA blocks with high efficiency, but also stabilized the assembled multi-level structures especially for promoting the complicated 3D DNA hydrogel formation. This study develops a novel strategy for rapid and high-yield production of DNA hydrogel even derived from instable building blocks at relatively low DNA concentrations, which would endow DNA nanotechnology for more practical applications.
Orientation of airborne laser scanning point clouds with multi-view, multi-scale image blocks.
Rönnholm, Petri; Hyyppä, Hannu; Hyyppä, Juha; Haggrén, Henrik
2009-01-01
Comprehensive 3D modeling of our environment requires integration of terrestrial and airborne data, which is collected, preferably, using laser scanning and photogrammetric methods. However, integration of these multi-source data requires accurate relative orientations. In this article, two methods for solving relative orientation problems are presented. The first method includes registration by minimizing the distances between of an airborne laser point cloud and a 3D model. The 3D model was derived from photogrammetric measurements and terrestrial laser scanning points. The first method was used as a reference and for validation. Having completed registration in the object space, the relative orientation between images and laser point cloud is known. The second method utilizes an interactive orientation method between a multi-scale image block and a laser point cloud. The multi-scale image block includes both aerial and terrestrial images. Experiments with the multi-scale image block revealed that the accuracy of a relative orientation increased when more images were included in the block. The orientations of the first and second methods were compared. The comparison showed that correct rotations were the most difficult to detect accurately by using the interactive method. Because the interactive method forces laser scanning data to fit with the images, inaccurate rotations cause corresponding shifts to image positions. However, in a test case, in which the orientation differences included only shifts, the interactive method could solve the relative orientation of an aerial image and airborne laser scanning data repeatedly within a couple of centimeters.
Orientation of Airborne Laser Scanning Point Clouds with Multi-View, Multi-Scale Image Blocks
Rönnholm, Petri; Hyyppä, Hannu; Hyyppä, Juha; Haggrén, Henrik
2009-01-01
Comprehensive 3D modeling of our environment requires integration of terrestrial and airborne data, which is collected, preferably, using laser scanning and photogrammetric methods. However, integration of these multi-source data requires accurate relative orientations. In this article, two methods for solving relative orientation problems are presented. The first method includes registration by minimizing the distances between of an airborne laser point cloud and a 3D model. The 3D model was derived from photogrammetric measurements and terrestrial laser scanning points. The first method was used as a reference and for validation. Having completed registration in the object space, the relative orientation between images and laser point cloud is known. The second method utilizes an interactive orientation method between a multi-scale image block and a laser point cloud. The multi-scale image block includes both aerial and terrestrial images. Experiments with the multi-scale image block revealed that the accuracy of a relative orientation increased when more images were included in the block. The orientations of the first and second methods were compared. The comparison showed that correct rotations were the most difficult to detect accurately by using the interactive method. Because the interactive method forces laser scanning data to fit with the images, inaccurate rotations cause corresponding shifts to image positions. However, in a test case, in which the orientation differences included only shifts, the interactive method could solve the relative orientation of an aerial image and airborne laser scanning data repeatedly within a couple of centimeters. PMID:22454569
Atmospheric Blocking and Atlantic Multi-Decadal Ocean Variability
NASA Technical Reports Server (NTRS)
Hakkinen, Sirpa; Rhines, Peter B.; Worthen, Denise L.
2011-01-01
Atmospheric blocking over the northern North Atlantic involves isolation of large regions of air from the westerly circulation for 5-14 days or more. From a recent 20th century atmospheric reanalysis (1,2) winters with more frequent blocking persist over several decades and correspond to a warm North Atlantic Ocean, in-phase with Atlantic multi-decadal ocean variability (AMV). Ocean circulation is forced by wind-stress curl and related air/sea heat exchange, and we find that their space-time structure is associated with dominant blocking patterns: weaker ocean gyres and weaker heat exchange contribute to the warm phase of AMV. Increased blocking activity extending from Greenland to British Isles is evident when winter blocking days of the cold years (1900-1929) are subtracted from those of the warm years (1939-1968).
A nonrecursive 'Order N' preconditioned conjugate gradient/range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, A. J.; Menon, R.; Sunkel, John
1991-01-01
This paper addresses the requirements of present-day mechanical system simulations of algorithms that induce parallelism on a fine scale and of transient simulation methods which must be automatically load balancing for a wide collection of system topologies and hardware configurations. To this end, a combination range space/preconditioned conjugage gradient formulation of multidegree-of-freedon dynamics is developed, which, by employing regular ordering of the system connectivity graph, makes it possible to derive an extremely efficient preconditioner from the range space metric (as opposed to the system coefficient matrix). Because of the effectiveness of the preconditioner, the method can achieve performance rates that depend linearly on the number of substructures. The method, termed 'Order N' does not require the assembly of system mass or stiffness matrices, and is therefore amenable to implementation on work stations. Using this method, a 13-substructure model of the Space Station was constructed.
NASA Technical Reports Server (NTRS)
Freund, Roland
1988-01-01
Conjugate gradient type methods are considered for the solution of large linear systems Ax = b with complex coefficient matrices of the type A = T + i(sigma)I where T is Hermitian and sigma, a real scalar. Three different conjugate gradient type approaches with iterates defined by a minimal residual property, a Galerkin type condition, and an Euclidian error minimization, respectively, are investigated. In particular, numerically stable implementations based on the ideas behind Paige and Saunder's SYMMLQ and MINRES for real symmetric matrices are proposed. Error bounds for all three methods are derived. It is shown how the special shift structure of A can be preserved by using polynomial preconditioning. Results on the optimal choice of the polynomial preconditioner are given. Also, some numerical experiments for matrices arising from finite difference approximations to the complex Helmholtz equation are reported.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frink, Neal T.
2016-01-01
Several improvements to the mixed-element USM3D discretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Improved Convergence and Robustness of USM3D Solutions on Mixed-Element Grids
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Diskin, Boris; Thomas, James L.; Frinks, Neal T.
2016-01-01
Several improvements to the mixed-elementUSM3Ddiscretization and defect-correction schemes have been made. A new methodology for nonlinear iterations, called the Hierarchical Adaptive Nonlinear Iteration Method, has been developed and implemented. The Hierarchical Adaptive Nonlinear Iteration Method provides two additional hierarchies around a simple and approximate preconditioner of USM3D. The hierarchies are a matrix-free linear solver for the exact linearization of Reynolds-averaged Navier-Stokes equations and a nonlinear control of the solution update. Two variants of the Hierarchical Adaptive Nonlinear Iteration Method are assessed on four benchmark cases, namely, a zero-pressure-gradient flat plate, a bump-in-channel configuration, the NACA 0012 airfoil, and a NASA Common Research Model configuration. The new methodology provides a convergence acceleration factor of 1.4 to 13 over the preconditioner-alone method representing the baseline solver technology.
Preconditioner-free Wiener filtering with a dense noise matrix
NASA Astrophysics Data System (ADS)
Huffenberger, Kevin M.
2018-05-01
This work extends the Elsner & Wandelt (2013) iterative method for efficient, preconditioner-free Wiener filtering to cases in which the noise covariance matrix is dense, but can be decomposed into a sum whose parts are sparse in convenient bases. The new method, which uses multiple messenger fields, reproduces Wiener-filter solutions for test problems, and we apply it to a case beyond the reach of the Elsner & Wandelt (2013) method. We compute the Wiener-filter solution for a simulated Cosmic Microwave Background (CMB) map that contains spatially varying, uncorrelated noise, isotropic 1/f noise, and large-scale horizontal stripes (like those caused by atmospheric noise). We discuss simple extensions that can filter contaminated modes or inverse-noise-filter the data. These techniques help to address complications in the noise properties of maps from current and future generations of ground-based Microwave Background experiments, like Advanced ACTPol, Simons Observatory, and CMB-S4.
NASA Astrophysics Data System (ADS)
Peng, Ao-Ping; Li, Zhi-Hui; Wu, Jun-Lin; Jiang, Xin-Yu
2016-12-01
Based on the previous researches of the Gas-Kinetic Unified Algorithm (GKUA) for flows from highly rarefied free-molecule transition to continuum, a new implicit scheme of cell-centered finite volume method is presented for directly solving the unified Boltzmann model equation covering various flow regimes. In view of the difficulty in generating the single-block grid system with high quality for complex irregular bodies, a multi-block docking grid generation method is designed on the basis of data transmission between blocks, and the data structure is constructed for processing arbitrary connection relations between blocks with high efficiency and reliability. As a result, the gas-kinetic unified algorithm with the implicit scheme and multi-block docking grid has been firstly established and used to solve the reentry flow problems around the multi-bodies covering all flow regimes with the whole range of Knudsen numbers from 10 to 3.7E-6. The implicit and explicit schemes are applied to computing and analyzing the supersonic flows in near-continuum and continuum regimes around a circular cylinder with careful comparison each other. It is shown that the present algorithm and modelling possess much higher computational efficiency and faster converging properties. The flow problems including two and three side-by-side cylinders are simulated from highly rarefied to near-continuum flow regimes, and the present computed results are found in good agreement with the related DSMC simulation and theoretical analysis solutions, which verify the good accuracy and reliability of the present method. It is observed that the spacing of the multi-body is smaller, the cylindrical throat obstruction is greater with the flow field of single-body asymmetrical more obviously and the normal force coefficient bigger. While in the near-continuum transitional flow regime of near-space flying surroundings, the spacing of the multi-body increases to six times of the diameter of the single-body, the interference effects of the multi-bodies tend to be negligible. The computing practice has confirmed that it is feasible for the present method to compute the aerodynamics and reveal flow mechanism around complex multi-body vehicles covering all flow regimes from the gas-kinetic point of view of solving the unified Boltzmann model velocity distribution function equation.
Level-Set Simulation of Viscous Free Surface Flow Around a Commercial Hull Form
2005-04-15
Abstract The viscous free surface flow around a 3600 TEU KRISO Container Ship is computed using the finite volume based multi-block RANS code, WAVIS...developed at KRISO . The free surface is captured with the Level-set method and the realizable k-ε model is employed for turbulence closure. The...computations are done for a 3600 TEU container ship of Korea Research Institute of Ships & Ocean Engineering, KORDI (hereafter, KRISO ) selected as
NASA Astrophysics Data System (ADS)
Ning, Mengmeng; Che, Hang; Kong, Weizhong; Wang, Peng; Liu, Bingxiao; Xu, Zhengdong; Wang, Xiaochao; Long, Changjun; Zhang, Bin; Wu, Youmei
2017-12-01
The physical characteristics of Xiliu 10 Block reservoir is poor, it has strong reservoir inhomogeneity between layers and high kaolinite content of the reservoir, the scaling trend of fluid is serious, causing high block injection well pressure and difficulty in achieving injection requirements. In the past acidizing process, the reaction speed with mineral is fast, the effective distance is shorter and It is also easier to lead to secondary sedimentation in conventional mud acid system. On this point, we raised multi-hydrogen acid technology, multi-hydrogen acid release hydrogen ions by multistage ionization which could react with pore blockage, fillings and skeletal effects with less secondary pollution. Multi-hydrogen acid system has advantages as moderate speed, deep penetration, clay low corrosion rate, wet water and restrains precipitation, etc. It can reach the goal of plug removal in deep stratum. The field application result shows that multi-hydrogen acid plug removal method has good effects on application in low permeability reservoir in Block Xiliu 10.
Li, Hongze; Gao, Xiang; Luo, Yingwu
2016-04-07
Multi-shape memory polymers were prepared by the macroscale spatio-assembly of building blocks in this work. The building blocks were methyl acrylate-co-styrene (MA-co-St) copolymers, which have the St-block-(St-random-MA)-block-St tri-block chain sequence. This design ensures that their transition temperatures can be adjusted over a wide range by varying the composition of the middle block. The two St blocks at the chain ends can generate a crosslink network in the final device to achieve strong bonding force between building blocks and the shape memory capacity. Due to their thermoplastic properties, 3D printing was employed for the spatio-assembly to build devices. This method is capable of introducing many transition phases into one device and preparing complicated shapes via 3D printing. The device can perform a complex action via a series of shape changes. Besides, this method can avoid the difficult programing of a series of temporary shapes. The control of intermediate temporary shapes was realized via programing the shapes and locations of building blocks in the final device.
NASA Technical Reports Server (NTRS)
Cannizzaro, Frank E.; Ash, Robert L.
1992-01-01
A state-of-the-art computer code has been developed that incorporates a modified Runge-Kutta time integration scheme, upwind numerical techniques, multigrid acceleration, and multi-block capabilities (RUMM). A three-dimensional thin-layer formulation of the Navier-Stokes equations is employed. For turbulent flow cases, the Baldwin-Lomax algebraic turbulence model is used. Two different upwind techniques are available: van Leer's flux-vector splitting and Roe's flux-difference splitting. Full approximation multi-grid plus implicit residual and corrector smoothing were implemented to enhance the rate of convergence. Multi-block capabilities were developed to provide geometric flexibility. This feature allows the developed computer code to accommodate any grid topology or grid configuration with multiple topologies. The results shown in this dissertation were chosen to validate the computer code and display its geometric flexibility, which is provided by the multi-block structure.
Compressed multi-block local binary pattern for object tracking
NASA Astrophysics Data System (ADS)
Li, Tianwen; Gao, Yun; Zhao, Lei; Zhou, Hao
2018-04-01
Both robustness and real-time are very important for the application of object tracking under a real environment. The focused trackers based on deep learning are difficult to satisfy with the real-time of tracking. Compressive sensing provided a technical support for real-time tracking. In this paper, an object can be tracked via a multi-block local binary pattern feature. The feature vector was extracted based on the multi-block local binary pattern feature, which was compressed via a sparse random Gaussian matrix as the measurement matrix. The experiments showed that the proposed tracker ran in real-time and outperformed the existed compressive trackers based on Haar-like feature on many challenging video sequences in terms of accuracy and robustness.
NASA Astrophysics Data System (ADS)
Guo, Tongqing; Chen, Hao; Lu, Zhiliang
2018-05-01
Aiming at extremely large deformation, a novel predictor-corrector-based dynamic mesh method for multi-block structured grid is proposed. In this work, the dynamic mesh generation is completed in three steps. At first, some typical dynamic positions are selected and high-quality multi-block grids with the same topology are generated at those positions. Then, Lagrange interpolation method is adopted to predict the dynamic mesh at any dynamic position. Finally, a rapid elastic deforming technique is used to correct the small deviation between the interpolated geometric configuration and the actual instantaneous one. Compared with the traditional methods, the results demonstrate that the present method shows stronger deformation ability and higher dynamic mesh quality.
Sound transmission through a poroelastic layered panel
NASA Astrophysics Data System (ADS)
Nagler, Loris; Rong, Ping; Schanz, Martin; von Estorff, Otto
2014-04-01
Multi-layered panels are often used to improve the acoustics in cars, airplanes, rooms, etc. For such an application these panels include porous and/or fibrous layers. The proposed numerical method is an approach to simulate the acoustical behavior of such multi-layered panels. The model assumes plate-like structures and, hence, combines plate theories for the different layers. The poroelastic layer is modelled with a recently developed plate theory. This theory uses a series expansion in thickness direction with subsequent analytical integration in this direction to reduce the three dimensions to two. The same idea is used to model either air gaps or fibrous layers. The latter are modeled as equivalent fluid and can be handled like an air gap, i.e., a kind of `air plate' is used. The coupling of the layers is done by using the series expansion to express the continuity conditions on the surfaces of the plates. The final system is solved with finite elements, where domain decomposition techniques in combination with preconditioned iterative solvers are applied to solve the final system of equations. In a large frequency range, the comparison with measurements shows very good agreement. From the numerical solution process it can be concluded that different preconditioners for the different layers are necessary. A reuse of the Krylov subspace of the iterative solvers pays if several excitations have to be computed but not that much in the loop over the frequencies.
Yang, Yong; Tong, Song; Huang, Shuying; Lin, Pan
2014-01-01
This paper presents a novel framework for the fusion of multi-focus images explicitly designed for visual sensor network (VSN) environments. Multi-scale based fusion methods can often obtain fused images with good visual effect. However, because of the defects of the fusion rules, it is almost impossible to completely avoid the loss of useful information in the thus obtained fused images. The proposed fusion scheme can be divided into two processes: initial fusion and final fusion. The initial fusion is based on a dual-tree complex wavelet transform (DTCWT). The Sum-Modified-Laplacian (SML)-based visual contrast and SML are employed to fuse the low- and high-frequency coefficients, respectively, and an initial composited image is obtained. In the final fusion process, the image block residuals technique and consistency verification are used to detect the focusing areas and then a decision map is obtained. The map is used to guide how to achieve the final fused image. The performance of the proposed method was extensively tested on a number of multi-focus images, including no-referenced images, referenced images, and images with different noise levels. The experimental results clearly indicate that the proposed method outperformed various state-of-the-art fusion methods, in terms of both subjective and objective evaluations, and is more suitable for VSNs. PMID:25587878
Yang, Yong; Tong, Song; Huang, Shuying; Lin, Pan
2014-11-26
This paper presents a novel framework for the fusion of multi-focus images explicitly designed for visual sensor network (VSN) environments. Multi-scale based fusion methods can often obtain fused images with good visual effect. However, because of the defects of the fusion rules, it is almost impossible to completely avoid the loss of useful information in the thus obtained fused images. The proposed fusion scheme can be divided into two processes: initial fusion and final fusion. The initial fusion is based on a dual-tree complex wavelet transform (DTCWT). The Sum-Modified-Laplacian (SML)-based visual contrast and SML are employed to fuse the low- and high-frequency coefficients, respectively, and an initial composited image is obtained. In the final fusion process, the image block residuals technique and consistency verification are used to detect the focusing areas and then a decision map is obtained. The map is used to guide how to achieve the final fused image. The performance of the proposed method was extensively tested on a number of multi-focus images, including no-referenced images, referenced images, and images with different noise levels. The experimental results clearly indicate that the proposed method outperformed various state-of-the-art fusion methods, in terms of both subjective and objective evaluations, and is more suitable for VSNs.
Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition
Ong, Frank; Lustig, Michael
2016-01-01
We present a natural generalization of the recent low rank + sparse matrix decomposition and consider the decomposition of matrices into components of multiple scales. Such decomposition is well motivated in practice as data matrices often exhibit local correlations in multiple scales. Concretely, we propose a multi-scale low rank modeling that represents a data matrix as a sum of block-wise low rank matrices with increasing scales of block sizes. We then consider the inverse problem of decomposing the data matrix into its multi-scale low rank components and approach the problem via a convex formulation. Theoretically, we show that under various incoherence conditions, the convex program recovers the multi-scale low rank components either exactly or approximately. Practically, we provide guidance on selecting the regularization parameters and incorporate cycle spinning to reduce blocking artifacts. Experimentally, we show that the multi-scale low rank decomposition provides a more intuitive decomposition than conventional low rank methods and demonstrate its effectiveness in four applications, including illumination normalization for face images, motion separation for surveillance videos, multi-scale modeling of the dynamic contrast enhanced magnetic resonance imaging and collaborative filtering exploiting age information. PMID:28450978
A General Sparse Tensor Framework for Electronic Structure Theory
Manzer, Samuel; Epifanovsky, Evgeny; Krylov, Anna I.; ...
2017-01-24
Linear-scaling algorithms must be developed in order to extend the domain of applicability of electronic structure theory to molecules of any desired size. But, the increasing complexity of modern linear-scaling methods makes code development and maintenance a significant challenge. A major contributor to this difficulty is the lack of robust software abstractions for handling block-sparse tensor operations. We therefore report the development of a highly efficient symbolic block-sparse tensor library in order to provide access to high-level software constructs to treat such problems. Our implementation supports arbitrary multi-dimensional sparsity in all input and output tensors. We then avoid cumbersome machine-generatedmore » code by implementing all functionality as a high-level symbolic C++ language library and demonstrate that our implementation attains very high performance for linear-scaling sparse tensor contractions.« less
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Lin; Shao, Sihong; E, Weinan
2012-11-06
We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Ptmore » $$_{2}$$, Au$$_{2}$$, TlF, and Bi$$_{2}$$Se$$_{3}$$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements.« less
NASA Astrophysics Data System (ADS)
Akilan, A.; Nagasubramanian, V.; Chaudhry, A.; Reddy, D. Rajesh; Sudheer Reddy, D.; Usha Devi, R.; Tirupati, T.; Radhadevi, P. V.; Varadan, G.
2014-11-01
Block Adjustment is a technique for large area mapping for images obtained from different remote sensingsatellites.The challenge in this process is to handle huge number of satellite imageries from different sources with different resolution and accuracies at the system level. This paper explains a system with various tools and techniques to effectively handle the end-to-end chain in large area mapping and production with good level of automation and the provisions for intuitive analysis of final results in 3D and 2D environment. In addition, the interface for using open source ortho and DEM references viz., ETM, SRTM etc. and displaying ESRI shapes for the image foot-prints are explained. Rigorous theory, mathematical modelling, workflow automation and sophisticated software engineering tools are included to ensure high photogrammetric accuracy and productivity. Major building blocks like Georeferencing, Geo-capturing and Geo-Modelling tools included in the block adjustment solution are explained in this paper. To provide optimal bundle block adjustment solution with high precision results, the system has been optimized in many stages to exploit the full utilization of hardware resources. The robustness of the system is ensured by handling failure in automatic procedure and saving the process state in every stage for subsequent restoration from the point of interruption. The results obtained from various stages of the system are presented in the paper.
Cosmic Microwave Background Mapmaking with a Messenger Field
NASA Astrophysics Data System (ADS)
Huffenberger, Kevin M.; Næss, Sigurd K.
2018-01-01
We apply a messenger field method to solve the linear minimum-variance mapmaking equation in the context of Cosmic Microwave Background (CMB) observations. In simulations, the method produces sky maps that converge significantly faster than those from a conjugate gradient descent algorithm with a diagonal preconditioner, even though the computational cost per iteration is similar. The messenger method recovers large scales in the map better than conjugate gradient descent, and yields a lower overall χ2. In the single, pencil beam approximation, each iteration of the messenger mapmaking procedure produces an unbiased map, and the iterations become more optimal as they proceed. A variant of the method can handle differential data or perform deconvolution mapmaking. The messenger method requires no preconditioner, but a high-quality solution needs a cooling parameter to control the convergence. We study the convergence properties of this new method and discuss how the algorithm is feasible for the large data sets of current and future CMB experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schiffmann, Florian; VandeVondele, Joost, E-mail: Joost.VandeVondele@mat.ethz.ch
2015-06-28
We present an improved preconditioning scheme for electronic structure calculations based on the orbital transformation method. First, a preconditioner is developed which includes information from the full Kohn-Sham matrix but avoids computationally demanding diagonalisation steps in its construction. This reduces the computational cost of its construction, eliminating a bottleneck in large scale simulations, while maintaining rapid convergence. In addition, a modified form of Hotelling’s iterative inversion is introduced to replace the exact inversion of the preconditioner matrix. This method is highly effective during molecular dynamics (MD), as the solution obtained in earlier MD steps is a suitable initial guess. Filteringmore » small elements during sparse matrix multiplication leads to linear scaling inversion, while retaining robustness, already for relatively small systems. For system sizes ranging from a few hundred to a few thousand atoms, which are typical for many practical applications, the improvements to the algorithm lead to a 2-5 fold speedup per MD step.« less
FANS-3D Users Guide (ESTEP Project ER 201031)
2016-08-01
governing laminar and turbulent flows in body-fitted curvilinear grids. The code employs multi-block overset ( chimera ) grids, including fully matched...governing incompressible flow in body-fitted grids. The code allows for multi-block overset ( chimera ) grids, which can be fully matched, arbitrarily...interested reader may consult the Chimera Overset Structured Mesh-Interpolation Code (COSMIC) Users’ Manual (Chen, 2009). The input file used for
Simulation requirements for the Large Deployable Reflector (LDR)
NASA Technical Reports Server (NTRS)
Soosaar, K.
1984-01-01
Simulation tools for the large deployable reflector (LDR) are discussed. These tools are often the transfer function variety equations. However, transfer functions are inadequate to represent time-varying systems for multiple control systems with overlapping bandwidths characterized by multi-input, multi-output features. Frequency domain approaches are the useful design tools, but a full-up simulation is needed. Because of the need for a dedicated computer for high frequency multi degree of freedom components encountered, non-real time smulation is preferred. Large numerical analysis software programs are useful only to receive inputs and provide output to the next block, and should be kept out of the direct loop of simulation. The following blocks make up the simulation. The thermal model block is a classical heat transfer program. It is a non-steady state program. The quasistatic block deals with problems associated with rigid body control of reflector segments. The steady state block assembles data into equations of motion and dynamics. A differential raytrace is obtained to establish a change in wave aberrations. The observation scene is described. The focal plane module converts the photon intensity impinging on it into electron streams or into permanent film records.
Algorithms for the automatic generation of 2-D structured multi-block grids
NASA Technical Reports Server (NTRS)
Schoenfeld, Thilo; Weinerfelt, Per; Jenssen, Carl B.
1995-01-01
Two different approaches to the fully automatic generation of structured multi-block grids in two dimensions are presented. The work aims to simplify the user interactivity necessary for the definition of a multiple block grid topology. The first approach is based on an advancing front method commonly used for the generation of unstructured grids. The original algorithm has been modified toward the generation of large quadrilateral elements. The second method is based on the divide-and-conquer paradigm with the global domain recursively partitioned into sub-domains. For either method each of the resulting blocks is then meshed using transfinite interpolation and elliptic smoothing. The applicability of these methods to practical problems is demonstrated for typical geometries of fluid dynamics.
Individual relocation decisions after tornadoes: a multi-level analysis.
Cong, Zhen; Nejat, Ali; Liang, Daan; Pei, Yaolin; Javid, Roxana J
2018-04-01
This study examines how multi-level factors affected individuals' relocation decisions after EF4 and EF5 (Enhanced Fujita Tornado Intensity Scale) tornadoes struck the United States in 2013. A telephone survey was conducted with 536 respondents, including oversampled older adults, one year after these two disaster events. Respondents' addresses were used to associate individual information with block group-level variables recorded by the American Community Survey. Logistic regression revealed that residential damage and homeownership are important predictors of relocation. There was also significant interaction between these two variables, indicating less difference between homeowners and renters at higher damage levels. Homeownership diminished the likelihood of relocation among younger respondents. Random effects logistic regression found that the percentage of homeownership and of higher income households in the community buffered the effect of damage on relocation; the percentage of older adults reduced the likelihood of this group relocating. The findings are assessed from the standpoint of age difference, policy implications, and social capital and vulnerability. © 2018 The Author(s). Disasters © Overseas Development Institute, 2018.
Yamaguchi, Satoshi; Inoue, Sayuri; Sakai, Takahiko; Abe, Tomohiro; Kitagawa, Haruaki; Imazato, Satoshi
2017-05-01
The objective of this study was to assess the effect of silica nano-filler particle diameters in a computer-aided design/manufacturing (CAD/CAM) composite resin (CR) block on physical properties at the multi-scale in silico. CAD/CAM CR blocks were modeled, consisting of silica nano-filler particles (20, 40, 60, 80, and 100 nm) and matrix (Bis-GMA/TEGDMA), with filler volume contents of 55.161%. Calculation of Young's moduli and Poisson's ratios for the block at macro-scale were analyzed by homogenization. Macro-scale CAD/CAM CR blocks (3 × 3 × 3 mm) were modeled and compressive strengths were defined when the fracture loads exceeded 6075 N. MPS values of the nano-scale models were compared by localization analysis. As the filler size decreased, Young's moduli and compressive strength increased, while Poisson's ratios and MPS decreased. All parameters were significantly correlated with the diameters of the filler particles (Pearson's correlation test, r = -0.949, 0.943, -0.951, 0.976, p < 0.05). The in silico multi-scale model established in this study demonstrates that the Young's moduli, Poisson's ratios, and compressive strengths of CAD/CAM CR blocks can be enhanced by loading silica nanofiller particles of smaller diameter. CAD/CAM CR blocks by using smaller silica nano-filler particles have a potential to increase fracture resistance.
2010-12-01
discontinuous coefficients on geometrically nonconforming substructures. Technical Report Serie A 634, Instituto de Matematica Pura e Aplicada, Brazil, 2009...Instituto de Matematica Pura e Aplicada, Brazil, 2010. submitted. [41] M. Dryja, M. V. Sarkis, and O. B. Widlund. Multilevel Schwarz methods for
NASA Astrophysics Data System (ADS)
Bousserez, Nicolas; Henze, Daven; Bowman, Kevin; Liu, Junjie; Jones, Dylan; Keller, Martin; Deng, Feng
2013-04-01
This work presents improved analysis error estimates for 4D-Var systems. From operational NWP models to top-down constraints on trace gas emissions, many of today's data assimilation and inversion systems in atmospheric science rely on variational approaches. This success is due to both the mathematical clarity of these formulations and the availability of computationally efficient minimization algorithms. However, unlike Kalman Filter-based algorithms, these methods do not provide an estimate of the analysis or forecast error covariance matrices, these error statistics being propagated only implicitly by the system. From both a practical (cycling assimilation) and scientific perspective, assessing uncertainties in the solution of the variational problem is critical. For large-scale linear systems, deterministic or randomization approaches can be considered based on the equivalence between the inverse Hessian of the cost function and the covariance matrix of analysis error. For perfectly quadratic systems, like incremental 4D-Var, Lanczos/Conjugate-Gradient algorithms have proven to be most efficient in generating low-rank approximations of the Hessian matrix during the minimization. For weakly non-linear systems though, the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), a quasi-Newton descent algorithm, is usually considered the best method for the minimization. Suitable for large-scale optimization, this method allows one to generate an approximation to the inverse Hessian using the latest m vector/gradient pairs generated during the minimization, m depending upon the available core memory. At each iteration, an initial low-rank approximation to the inverse Hessian has to be provided, which is called preconditioning. The ability of the preconditioner to retain useful information from previous iterations largely determines the efficiency of the algorithm. Here we assess the performance of different preconditioners to estimate the inverse Hessian of a large-scale 4D-Var system. The impact of using the diagonal preconditioners proposed by Gilbert and Le Maréchal (1989) instead of the usual Oren-Spedicato scalar will be first presented. We will also introduce new hybrid methods that combine randomization estimates of the analysis error variance with L-BFGS diagonal updates to improve the inverse Hessian approximation. Results from these new algorithms will be evaluated against standard large ensemble Monte-Carlo simulations. The methods explored here are applied to the problem of inferring global atmospheric CO2 fluxes using remote sensing observations, and are intended to be integrated with the future NASA Carbon Monitoring System.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Atmospheric Blocking and Atlantic Multi-Decadal Ocean Variability
NASA Technical Reports Server (NTRS)
Haekkinen, Sirpa; Rhines, Peter B.; Worthlen, Denise L.
2011-01-01
Based on the 20th century atmospheric reanalysis, winters with more frequent blocking, in a band of blocked latitudes from Greenland to Western Europe, are found to persist over several decades and correspond to a warm North Atlantic Ocean, in-phase with Atlantic multi-decadal ocean variability. Atmospheric blocking over the northern North Atlantic, which involves isolation of large regions of air from the westerly circulation for 5 days or more, influences fundamentally the ocean circulation and upper ocean properties by impacting wind patterns. Winters with clusters of more frequent blocking between Greenland and western Europe correspond to a warmer, more saline subpolar ocean. The correspondence between blocked westerly winds and warm ocean holds in recent decadal episodes (especially, 1996-2010). It also describes much longer-timescale Atlantic multidecadal ocean variability (AMV), including the extreme, pre-greenhouse-gas, northern warming of the 1930s-1960s. The space-time structure of the wind forcing associated with a blocked regime leads to weaker ocean gyres and weaker heat-exchange, both of which contribute to the warm phase of AMV.
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
NASA Astrophysics Data System (ADS)
Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain
2014-11-01
Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
Additive Manufacturing of Molds for Fabrication of Insulated Concrete Block
DOE Office of Scientific and Technical Information (OSTI.GOV)
Love, Lonnie J.; Lloyd, Peter D.
ORNL worked with concrete block manufacturer, NRG Insulated Block, to demonstrate additive manufacturing of a multi-component block mold for its line of insulated blocks. Solid models of the mold parts were constructed from existing two-dimensional drawings and the parts were fabricated on a Stratasys Fortus 900 using ULTEM 9085. Block mold parts were delivered to NRG and installed on one of their fabrication lines. While form and fit were acceptable, the molds failed to function during NRG’s testing.
Layout optimization with algebraic multigrid methods
NASA Technical Reports Server (NTRS)
Regler, Hans; Ruede, Ulrich
1993-01-01
Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
Fatigue Life Estimation under Cumulative Cyclic Loading Conditions
NASA Technical Reports Server (NTRS)
Kalluri, Sreeramesh; McGaw, Michael A; Halford, Gary R.
1999-01-01
The cumulative fatigue behavior of a cobalt-base superalloy, Haynes 188 was investigated at 760 C in air. Initially strain-controlled tests were conducted on solid cylindrical gauge section specimens of Haynes 188 under fully-reversed, tensile and compressive mean strain-controlled fatigue tests. Fatigue data from these tests were used to establish the baseline fatigue behavior of the alloy with 1) a total strain range type fatigue life relation and 2) the Smith-Wastson-Topper (SWT) parameter. Subsequently, two load-level multi-block fatigue tests were conducted on similar specimens of Haynes 188 at the same temperature. Fatigue lives of the multi-block tests were estimated with 1) the Linear Damage Rule (LDR) and 2) the nonlinear Damage Curve Approach (DCA) both with and without the consideration of mean stresses generated during the cumulative fatigue tests. Fatigue life predictions by the nonlinear DCA were much closer to the experimentally observed lives than those obtained by the LDR. In the presence of mean stresses, the SWT parameter estimated the fatigue lives more accurately under tensile conditions than under compressive conditions.
Bi, Sheng; Zeng, Xiao; Tang, Xin; Qin, Shujia; Lai, King Wai Chiu
2016-01-01
Compressive sensing (CS) theory has opened up new paths for the development of signal processing applications. Based on this theory, a novel single pixel camera architecture has been introduced to overcome the current limitations and challenges of traditional focal plane arrays. However, video quality based on this method is limited by existing acquisition and recovery methods, and the method also suffers from being time-consuming. In this paper, a multi-frame motion estimation algorithm is proposed in CS video to enhance the video quality. The proposed algorithm uses multiple frames to implement motion estimation. Experimental results show that using multi-frame motion estimation can improve the quality of recovered videos. To further reduce the motion estimation time, a block match algorithm is used to process motion estimation. Experiments demonstrate that using the block match algorithm can reduce motion estimation time by 30%. PMID:26950127
Quantum interference between transverse spatial waveguide modes.
Mohanty, Aseema; Zhang, Mian; Dutt, Avik; Ramelow, Sven; Nussenzveig, Paulo; Lipson, Michal
2017-01-20
Integrated quantum optics has the potential to markedly reduce the footprint and resource requirements of quantum information processing systems, but its practical implementation demands broader utilization of the available degrees of freedom within the optical field. To date, integrated photonic quantum systems have primarily relied on path encoding. However, in the classical regime, the transverse spatial modes of a multi-mode waveguide have been easily manipulated using the waveguide geometry to densely encode information. Here, we demonstrate quantum interference between the transverse spatial modes within a single multi-mode waveguide using quantum circuit-building blocks. This work shows that spatial modes can be controlled to an unprecedented level and have the potential to enable practical and robust quantum information processing.
Millard, Daniel; Dang, Qianyu; Shi, Hong; Zhang, Xiaou; Strock, Chris; Kraushaar, Udo; Zeng, Haoyu; Levesque, Paul; Lu, Hua-Rong; Guillon, Jean-Michel; Wu, Joseph C; Li, Yingxin; Luerman, Greg; Anson, Blake; Guo, Liang; Clements, Mike; Abassi, Yama A; Ross, James; Pierson, Jennifer; Gintant, Gary
2018-04-27
Recent in vitro cardiac safety studies demonstrate the ability of human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) to detect electrophysiologic effects of drugs. However, variability contributed by unique approaches, procedures, cell lines and reagents across laboratories makes comparisons of results difficult, leading to uncertainty about the role of hiPSC-CMs in defining proarrhythmic risk in drug discovery and regulatory submissions. A blinded pilot study was conducted to evaluate the electrophysiologic effects of eight well-characterized drugs on four cardiomyocyte lines using a standardized protocol across three microelectrode array (MEA) platforms (18 individual studies). Drugs were selected to define assay sensitivity of prominent repolarizing currents (E-4031 for IKr, JNJ303 for IKs) and depolarizing currents (nifedipine for ICaL, mexiletine for INa) as well as drugs affecting multi-channel block (flecainide, moxifloxacin, quinidine, and ranolazine). Inclusion criteria for final analysis was based on demonstrated sensitivity to IKr block (20% prolongation with E-4031) and L-type calcium current block (20% shortening with nifedipine). Despite differences in baseline characteristics across cardiomyocyte lines, multiple sites and instrument platforms, 10 of 18 studies demonstrated adequate sensitivity to IKr block with E-4031 and ICaL block with nifedipine for inclusion in the final analysis. Concentration-dependent effects on repolarization were observed with this qualified dataset consistent with known ionic mechanisms of single and multi-channel blocking drugs. hiPSC-CMs can detect repolarization effects elicited by single and multi-channel blocking drugs after defining pharmacologic sensitivity to IKr and ICaL block, supporting further validation efforts using hiPSC-CMs for cardiac safety studies.
Multi-level bandwidth efficient block modulation codes
NASA Technical Reports Server (NTRS)
Lin, Shu
1989-01-01
The multilevel technique is investigated for combining block coding and modulation. There are four parts. In the first part, a formulation is presented for signal sets on which modulation codes are to be constructed. Distance measures on a signal set are defined and their properties are developed. In the second part, a general formulation is presented for multilevel modulation codes in terms of component codes with appropriate Euclidean distances. The distance properties, Euclidean weight distribution and linear structure of multilevel modulation codes are investigated. In the third part, several specific methods for constructing multilevel block modulation codes with interdependency among component codes are proposed. Given a multilevel block modulation code C with no interdependency among the binary component codes, the proposed methods give a multilevel block modulation code C which has the same rate as C, a minimum squared Euclidean distance not less than that of code C, a trellis diagram with the same number of states as that of C and a smaller number of nearest neighbor codewords than that of C. In the last part, error performance of block modulation codes is analyzed for an AWGN channel based on soft-decision maximum likelihood decoding. Error probabilities of some specific codes are evaluated based on their Euclidean weight distributions and simulation results.
Multiframe video coding for improved performance over wireless channels.
Budagavi, M; Gibson, J D
2001-01-01
We propose and evaluate a multi-frame extension to block motion compensation (BMC) coding of videoconferencing-type video signals for wireless channels. The multi-frame BMC (MF-BMC) coder makes use of the redundancy that exists across multiple frames in typical videoconferencing sequences to achieve additional compression over that obtained by using the single frame BMC (SF-BMC) approach, such as in the base-level H.263 codec. The MF-BMC approach also has an inherent ability of overcoming some transmission errors and is thus more robust when compared to the SF-BMC approach. We model the error propagation process in MF-BMC coding as a multiple Markov chain and use Markov chain analysis to infer that the use of multiple frames in motion compensation increases robustness. The Markov chain analysis is also used to devise a simple scheme which randomizes the selection of the frame (amongst the multiple previous frames) used in BMC to achieve additional robustness. The MF-BMC coders proposed are a multi-frame extension of the base level H.263 coder and are found to be more robust than the base level H.263 coder when subjected to simulated errors commonly encountered on wireless channels.
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
2013-01-02
Compilation JVM Java Virtual Machine KB Kilobyte KDT Knowledge Discovery Toolbox LAPACK Linear Algebra Package LLVM Low-Level Virtual Machine LOC Lines...different starting points. Leo Meyerovich also helped solidify some of the ideas here in discussions during Par Lab retreats. I would also like to thank...multi-timestep computations by blocking in both time and space. 88 Implementation Output Approx DSL Type Language Language Parallelism LoC Graphite
Genotoxicity of multi-walled carbon nanotubes at occupationally relevant doses
2014-01-01
Carbon nanotubes are commercially-important products of nanotechnology; however, their low density and small size makes carbon nanotube respiratory exposures likely during their production or processing. We have previously shown mitotic spindle aberrations in cultured primary and immortalized human airway epithelial cells exposed to single-walled carbon nanotubes (SWCNT). In this study, we examined whether multi-walled carbon nanotubes (MWCNT) cause mitotic spindle damage in cultured cells at doses equivalent to 34 years of exposure at the NIOSH Recommended Exposure Limit (REL). MWCNT induced a dose responsive increase in disrupted centrosomes, abnormal mitotic spindles and aneuploid chromosome number 24 hours after exposure to 0.024, 0.24, 2.4 and 24 μg/cm2 MWCNT. Monopolar mitotic spindles comprised 95% of disrupted mitoses. Three-dimensional reconstructions of 0.1 μm optical sections showed carbon nanotubes integrated with microtubules, DNA and within the centrosome structure. Cell cycle analysis demonstrated a greater number of cells in S-phase and fewer cells in the G2 phase in MWCNT-treated compared to diluent control, indicating a G1/S block in the cell cycle. The monopolar phenotype of the disrupted mitotic spindles and the G1/S block in the cell cycle is in sharp contrast to the multi-polar spindle and G2 block in the cell cycle previously observed following exposure to SWCNT. One month following exposure to MWCNT there was a dramatic increase in both size and number of colonies compared to diluent control cultures, indicating a potential to pass the genetic damage to daughter cells. Our results demonstrate significant disruption of the mitotic spindle by MWCNT at occupationally relevant exposure levels. PMID:24479647
NASA Astrophysics Data System (ADS)
Sakaki, Yukiya; Yamada, Tomoaki; Matsui, Chihiro; Yamaga, Yusuke; Takeuchi, Ken
2018-04-01
In order to improve performance of solid-state drives (SSDs), hybrid SSDs have been proposed. Hybrid SSDs consist of more than two types of NAND flash memories or NAND flash memories and storage-class memories (SCMs). However, the cost of hybrid SSDs adopting SCMs is more expensive than that of NAND flash only SSDs because of the high bit cost of SCMs. This paper proposes unique hybrid SSDs with two-dimensional (2D) horizontal multi-level cell (MLC)/three-dimensional (3D) vertical triple-level cell (TLC) NAND flash memories to achieve higher cost-performance. The 2D-MLC/3D-TLC hybrid SSD achieves up to 31% higher performance than the conventional 2D-MLC/2D-TLC hybrid SSD. The factors of different performance between the proposed hybrid SSD and the conventional hybrid SSD are analyzed by changing its block size, read/write/erase latencies, and write unit of 3D-TLC NAND flash memory, by means of a transaction-level modeling simulator.
Extending fields in a level set method by solving a biharmonic equation
NASA Astrophysics Data System (ADS)
Moroney, Timothy J.; Lusmore, Dylan R.; McCue, Scott W.; McElwain, D. L. Sean
2017-08-01
We present an approach for computing extensions of velocities or other fields in level set methods by solving a biharmonic equation. The approach differs from other commonly used approaches to velocity extension because it deals with the interface fully implicitly through the level set function. No explicit properties of the interface, such as its location or the velocity on the interface, are required in computing the extension. These features lead to a particularly simple implementation using either a sparse direct solver or a matrix-free conjugate gradient solver. Furthermore, we propose a fast Poisson preconditioner that can be used to accelerate the convergence of the latter. We demonstrate the biharmonic extension on a number of test problems that serve to illustrate its effectiveness at producing smooth and accurate extensions near interfaces. A further feature of the method is the natural way in which it deals with symmetry and periodicity, ensuring through its construction that the extension field also respects these symmetries.
Park, Hyungmin; Kim, Jae-Up; Park, Soojin
2012-02-21
A simple, straightforward process for fabricating multi-scale micro- and nanostructured patterns from polystyrene-block-poly(2-vinylpyridine) (PS-b-P2VP)/poly(methyl methacrylate) (PMMA) homopolymer in a preferential solvent for PS and PMMA is demonstrated. When the PS-b-P2VP/PMMA blend films were spin-coated onto a silicon wafer, PS-b-P2VP micellar arrays consisting of a PS corona and a P2VP core were formed, while the PMMA macrodomains were isolated, due to the macrophase separation caused by the incompatibility between block copolymer micelles and PMMA homopolymer during the spin-coating process. With an increase of PMMA composition, the size of PMMA macrodomains increased. Moreover, the P2VP blocks have a strong interaction with a native oxide of the surface of the silicon wafer, so that the P2VP wetting layer was first formed during spin-coating, and PS nanoclusters were observed on the PMMA macrodomains beneath. Whereas when a silicon surface was modified with a PS brush layer, the PS nanoclusters underlying PMMA domains were not formed. The multi-scale patterns prepared from copolymer micelle/homopolymer blend films are used as templates for the fabrication of gold nanoparticle arrays by incorporating the gold precursor into the P2VP chains. The combination of nanostructures prepared from block copolymer micellar arrays and macrostructures induced by incompatibility between the copolymer and the homopolymer leads to the formation of complex, multi-scale surface patterns by a simple casting process. This journal is © The Royal Society of Chemistry 2012
Automatic classification and detection of clinically relevant images for diabetic retinopathy
NASA Astrophysics Data System (ADS)
Xu, Xinyu; Li, Baoxin
2008-03-01
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
A detail-preserved and luminance-consistent multi-exposure image fusion algorithm
NASA Astrophysics Data System (ADS)
Wang, Guanquan; Zhou, Yue
2018-04-01
When irradiance across a scene varies greatly, we can hardly get an image of the scene without over- or underexposure area, because of the constraints of cameras. Multi-exposure image fusion (MEF) is an effective method to deal with this problem by fusing multi-exposure images of a static scene. A novel MEF method is described in this paper. In the proposed algorithm, coarser-scale luminance consistency is preserved by contribution adjustment using the luminance information between blocks; detail-preserved smoothing filter can stitch blocks smoothly without losing details. Experiment results show that the proposed method performs well in preserving luminance consistency and details.
A Multi-Modality CMOS Sensor Array for Cell-Based Assay and Drug Screening.
Chi, Taiyun; Park, Jong Seok; Butts, Jessica C; Hookway, Tracy A; Su, Amy; Zhu, Chengjie; Styczynski, Mark P; McDevitt, Todd C; Wang, Hua
2015-12-01
In this paper, we present a fully integrated multi-modality CMOS cellular sensor array with four sensing modalities to characterize different cell physiological responses, including extracellular voltage recording, cellular impedance mapping, optical detection with shadow imaging and bioluminescence sensing, and thermal monitoring. The sensor array consists of nine parallel pixel groups and nine corresponding signal conditioning blocks. Each pixel group comprises one temperature sensor and 16 tri-modality sensor pixels, while each tri-modality sensor pixel can be independently configured for extracellular voltage recording, cellular impedance measurement (voltage excitation/current sensing), and optical detection. This sensor array supports multi-modality cellular sensing at the pixel level, which enables holistic cell characterization and joint-modality physiological monitoring on the same cellular sample with a pixel resolution of 80 μm × 100 μm. Comprehensive biological experiments with different living cell samples demonstrate the functionality and benefit of the proposed multi-modality sensing in cell-based assay and drug screening.
SmaggIce 2.0: Additional Capabilities for Interactive Grid Generation of Iced Airfoils
NASA Technical Reports Server (NTRS)
Kreeger, Richard E.; Baez, Marivell; Braun, Donald C.; Schilling, Herbert W.; Vickerman, Mary B.
2008-01-01
The Surface Modeling and Grid Generation for Iced Airfoils (SmaggIce) software toolkit has been extended to allow interactive grid generation for multi-element iced airfoils. The essential phases of an icing effects study include geometry preparation, block creation and grid generation. SmaggIce Version 2.0 now includes these main capabilities for both single and multi-element airfoils, plus an improved flow solver interface and a variety of additional tools to enhance the efficiency and accuracy of icing effects studies. An overview of these features is given, especially the new multi-element blocking strategy using the multiple wakes method. Examples are given which illustrate the capabilities of SmaggIce for conducting an icing effects study for both single and multi-element airfoils.
Identification and Reconfigurable Control of Impaired Multi-Rotor Drones
NASA Technical Reports Server (NTRS)
Stepanyan, Vahram; Krishnakumar, Kalmanje; Bencomo, Alfredo
2016-01-01
The paper presents an algorithm for control and safe landing of impaired multi-rotor drones when one or more motors fail simultaneously or in any sequence. It includes three main components: an identification block, a reconfigurable control block, and a decisions making block. The identification block monitors each motor load characteristics and the current drawn, based on which the failures are detected. The control block generates the required total thrust and three axis torques for the altitude, horizontal position and/or orientation control of the drone based on the time scale separation and nonlinear dynamic inversion. The horizontal displacement is controlled by modulating the roll and pitch angles. The decision making algorithm maps the total thrust and three torques into the individual motor thrusts based on the information provided by the identification block. The drone continues the mission execution as long as the number of functioning motors provide controllability of it. Otherwise, the controller is switched to the safe mode, which gives up the yaw control, commands a safe landing spot and descent rate while maintaining the horizontal attitude.
A Study of Multigrid Preconditioners Using Eigensystem Analysis
NASA Technical Reports Server (NTRS)
Roberts, Thomas W.; Swanson, R. C.
2005-01-01
The convergence properties of numerical schemes for partial differential equations are studied by examining the eigensystem of the discrete operator. This method of analysis is very general, and allows the effects of boundary conditions and grid nonuniformities to be examined directly. Algorithms for the Laplace equation and a two equation model hyperbolic system are examined.
Spectral element multigrid. Part 2: Theoretical justification
NASA Technical Reports Server (NTRS)
Maday, Yvon; Munoz, Rafael
1988-01-01
A multigrid algorithm is analyzed which is used for solving iteratively the algebraic system resulting from tha approximation of a second order problem by spectral or spectral element methods. The analysis, performed here in the one dimensional case, justifies the good smoothing properties of the Jacobi preconditioner that was presented in Part 1 of this paper.
NASA Astrophysics Data System (ADS)
Niki, Hiroshi; Harada, Kyouji; Morimoto, Munenori; Sakakihara, Michio
2004-03-01
Several preconditioned iterative methods reported in the literature have been used for improving the convergence rate of the Gauss-Seidel method. In this article, on the basis of nonnegative matrix, comparisons between some splittings for such preconditioned matrices are derived. Simple numerical examples are also given.
NASA Astrophysics Data System (ADS)
Chen, Hao; Lv, Wen; Zhang, Tongtong
2018-05-01
We study preconditioned iterative methods for the linear system arising in the numerical discretization of a two-dimensional space-fractional diffusion equation. Our approach is based on a formulation of the discrete problem that is shown to be the sum of two Kronecker products. By making use of an alternating Kronecker product splitting iteration technique we establish a class of fixed-point iteration methods. Theoretical analysis shows that the new method converges to the unique solution of the linear system. Moreover, the optimal choice of the involved iteration parameters and the corresponding asymptotic convergence rate are computed exactly when the eigenvalues of the system matrix are all real. The basic iteration is accelerated by a Krylov subspace method like GMRES. The corresponding preconditioner is in a form of a Kronecker product structure and requires at each iteration the solution of a set of discrete one-dimensional fractional diffusion equations. We use structure preserving approximations to the discrete one-dimensional fractional diffusion operators in the action of the preconditioning matrix. Numerical examples are presented to illustrate the effectiveness of this approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Tsuji, Paul H.
Here, this study explores the performance and scaling of a GMRES Krylov method employed as a smoother for an algebraic multigrid (AMG) preconditioned Newton- Krylov solution approach applied to a fully-implicit variational multiscale (VMS) nite element (FE) resistive magnetohydrodynamics (MHD) formulation. In this context a Newton iteration is used for the nonlinear system and a Krylov (GMRES) method is employed for the linear subsystems. The efficiency of this approach is critically dependent on the scalability and performance of the AMG preconditioner for the linear solutions and the performance of the smoothers play a critical role. Krylov smoothers are considered inmore » an attempt to reduce the time and memory requirements of existing robust smoothers based on additive Schwarz domain decomposition (DD) with incomplete LU factorization solves on each subdomain. Three time dependent resistive MHD test cases are considered to evaluate the method. The results demonstrate that the GMRES smoother can be faster due to a decrease in the preconditioner setup time and a reduction in outer GMRESR solver iterations, and requires less memory (typically 35% less memory for global GMRES smoother) than the DD ILU smoother.« less
Sibole, Scott C.; Erdemir, Ahmet
2012-01-01
Cells of the musculoskeletal system are known to respond to mechanical loading and chondrocytes within the cartilage are not an exception. However, understanding how joint level loads relate to cell level deformations, e.g. in the cartilage, is not a straightforward task. In this study, a multi-scale analysis pipeline was implemented to post-process the results of a macro-scale finite element (FE) tibiofemoral joint model to provide joint mechanics based displacement boundary conditions to micro-scale cellular FE models of the cartilage, for the purpose of characterizing chondrocyte deformations in relation to tibiofemoral joint loading. It was possible to identify the load distribution within the knee among its tissue structures and ultimately within the cartilage among its extracellular matrix, pericellular environment and resident chondrocytes. Various cellular deformation metrics (aspect ratio change, volumetric strain, cellular effective strain and maximum shear strain) were calculated. To illustrate further utility of this multi-scale modeling pipeline, two micro-scale cartilage constructs were considered: an idealized single cell at the centroid of a 100×100×100 μm block commonly used in past research studies, and an anatomically based (11 cell model of the same volume) representation of the middle zone of tibiofemoral cartilage. In both cases, chondrocytes experienced amplified deformations compared to those at the macro-scale, predicted by simulating one body weight compressive loading on the tibiofemoral joint. In the 11 cell case, all cells experienced less deformation than the single cell case, and also exhibited a larger variance in deformation compared to other cells residing in the same block. The coupling method proved to be highly scalable due to micro-scale model independence that allowed for exploitation of distributed memory computing architecture. The method’s generalized nature also allows for substitution of any macro-scale and/or micro-scale model providing application for other multi-scale continuum mechanics problems. PMID:22649535
NASA Astrophysics Data System (ADS)
Park, Hyungmin; Kim, Jae-Up; Park, Soojin
2012-02-01
A simple, straightforward process for fabricating multi-scale micro- and nanostructured patterns from polystyrene-block-poly(2-vinylpyridine) (PS-b-P2VP)/poly(methyl methacrylate) (PMMA) homopolymer in a preferential solvent for PS and PMMA is demonstrated. When the PS-b-P2VP/PMMA blend films were spin-coated onto a silicon wafer, PS-b-P2VP micellar arrays consisting of a PS corona and a P2VP core were formed, while the PMMA macrodomains were isolated, due to the macrophase separation caused by the incompatibility between block copolymer micelles and PMMA homopolymer during the spin-coating process. With an increase of PMMA composition, the size of PMMA macrodomains increased. Moreover, the P2VP blocks have a strong interaction with a native oxide of the surface of the silicon wafer, so that the P2VP wetting layer was first formed during spin-coating, and PS nanoclusters were observed on the PMMA macrodomains beneath. Whereas when a silicon surface was modified with a PS brush layer, the PS nanoclusters underlying PMMA domains were not formed. The multi-scale patterns prepared from copolymer micelle/homopolymer blend films are used as templates for the fabrication of gold nanoparticle arrays by incorporating the gold precursor into the P2VP chains. The combination of nanostructures prepared from block copolymer micellar arrays and macrostructures induced by incompatibility between the copolymer and the homopolymer leads to the formation of complex, multi-scale surface patterns by a simple casting process.A simple, straightforward process for fabricating multi-scale micro- and nanostructured patterns from polystyrene-block-poly(2-vinylpyridine) (PS-b-P2VP)/poly(methyl methacrylate) (PMMA) homopolymer in a preferential solvent for PS and PMMA is demonstrated. When the PS-b-P2VP/PMMA blend films were spin-coated onto a silicon wafer, PS-b-P2VP micellar arrays consisting of a PS corona and a P2VP core were formed, while the PMMA macrodomains were isolated, due to the macrophase separation caused by the incompatibility between block copolymer micelles and PMMA homopolymer during the spin-coating process. With an increase of PMMA composition, the size of PMMA macrodomains increased. Moreover, the P2VP blocks have a strong interaction with a native oxide of the surface of the silicon wafer, so that the P2VP wetting layer was first formed during spin-coating, and PS nanoclusters were observed on the PMMA macrodomains beneath. Whereas when a silicon surface was modified with a PS brush layer, the PS nanoclusters underlying PMMA domains were not formed. The multi-scale patterns prepared from copolymer micelle/homopolymer blend films are used as templates for the fabrication of gold nanoparticle arrays by incorporating the gold precursor into the P2VP chains. The combination of nanostructures prepared from block copolymer micellar arrays and macrostructures induced by incompatibility between the copolymer and the homopolymer leads to the formation of complex, multi-scale surface patterns by a simple casting process. Electronic supplementary information (ESI) available: AFM images of PS-b-P2VP/PMMA blend films and cross-sectional line scans. See DOI: 10.1039/c2nr11792d
NASA Astrophysics Data System (ADS)
Ravi, Sathish Kumar; Gawad, Jerzy; Seefeldt, Marc; Van Bael, Albert; Roose, Dirk
2017-10-01
A numerical multi-scale model is being developed to predict the anisotropic macroscopic material response of multi-phase steel. The embedded microstructure is given by a meso-scale Representative Volume Element (RVE), which holds the most relevant features like phase distribution, grain orientation, morphology etc., in sufficient detail to describe the multi-phase behavior of the material. A Finite Element (FE) mesh of the RVE is constructed using statistical information from individual phases such as grain size distribution and ODF. The material response of the RVE is obtained for selected loading/deformation modes through numerical FE simulations in Abaqus. For the elasto-plastic response of the individual grains, single crystal plasticity based plastic potential functions are proposed as Abaqus material definitions. The plastic potential functions are derived using the Facet method for individual phases in the microstructure at the level of single grains. The proposed method is a new modeling framework and the results presented in terms of macroscopic flow curves are based on the building blocks of the approach, while the model would eventually facilitate the construction of an anisotropic yield locus of the underlying multi-phase microstructure derived from a crystal plasticity based framework.
An accurate, fast, and scalable solver for high-frequency wave propagation
NASA Astrophysics Data System (ADS)
Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.
2017-12-01
In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
A stochastic approach to uncertainty in the equations of MHD kinematics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phillips, Edward G., E-mail: egphillips@math.umd.edu; Elman, Howard C., E-mail: elman@cs.umd.edu
2015-03-01
The magnetohydrodynamic (MHD) kinematics model describes the electromagnetic behavior of an electrically conducting fluid when its hydrodynamic properties are assumed to be known. In particular, the MHD kinematics equations can be used to simulate the magnetic field induced by a given velocity field. While prescribing the velocity field leads to a simpler model than the fully coupled MHD system, this may introduce some epistemic uncertainty into the model. If the velocity of a physical system is not known with certainty, the magnetic field obtained from the model may not be reflective of the magnetic field seen in experiments. Additionally, uncertaintymore » in physical parameters such as the magnetic resistivity may affect the reliability of predictions obtained from this model. By modeling the velocity and the resistivity as random variables in the MHD kinematics model, we seek to quantify the effects of uncertainty in these fields on the induced magnetic field. We develop stochastic expressions for these quantities and investigate their impact within a finite element discretization of the kinematics equations. We obtain mean and variance data through Monte Carlo simulation for several test problems. Toward this end, we develop and test an efficient block preconditioner for the linear systems arising from the discretized equations.« less
Quantitative analysis of pedestrian safety at uncontrolled multi-lane mid-block crosswalks in China.
Zhang, Cunbao; Zhou, Bin; Chen, Guojun; Chen, Feng
2017-11-01
A lot of pedestrian-vehicle crashes at mid-block crosswalks severely threaten pedestrian's safety around the world. The situations are even worse in China due to low yielding rate of vehicles at crosswalks. In order to quantitatively analyze pedestrian's safety at multi-lane mid-block crosswalks, the number of pedestrian-vehicle conflicts was utilized to evaluate pedestrian's accident risk. Five mid-block crosswalks (Wuhan, China) were videoed to collect data of traffic situation and pedestrian-vehicle conflicts, and the quantity and spatial distribution of pedestrian-vehicle conflicts at multi-lane mid-block crosswalk were analyzed according to lane-based post-encroachment time(LPET). Statistical results indicate that conflicts are mainly concentrated in lane3 and lane6. Percentage of conflict of each lane numbered from 1 to 6 respectively are 4.1%, 13.1%, 19.8%, 8.4%, 19.0%, 28.1%. Conflict rate under different crossing strategies are also counted. Moreover, an order probit (OP) model of pedestrian-vehicle conflict analysis (PVCA) was built to find out the contributions corresponding to those factors (such as traffic volume, vehicle speed, pedestrian crossing behavior, pedestrian refuge, etc.) to pedestrian-vehicle conflicts. The results show that: pedestrian refuge have positive effects on pedestrian safety; on the other hand, high vehicle speed, high traffic volume, rolling gap crossing pattern, and larger pedestrian platoon have negative effects on pedestrian safety. Based on our field observation and PVCA model, the number of conflicts will rise by 2% while the traffic volume increases 200 pcu/h; similarly, if the vehicle speed increases 5km/h, the number of conflicts will rise by 12% accordingly. The research results could be used to evaluate pedestrian safety at multi-lane mid-block crosswalks, and useful to improve pedestrian safety by means of pedestrian safety education, pedestrian refuge setting, vehicle speed limiting, and so on. Copyright © 2017 Elsevier Ltd. All rights reserved.
A grid generation system for multi-disciplinary design optimization
NASA Technical Reports Server (NTRS)
Jones, William T.; Samareh-Abolhassani, Jamshid
1995-01-01
A general multi-block three-dimensional volume grid generator is presented which is suitable for Multi-Disciplinary Design Optimization. The code is timely, robust, highly automated, and written in ANSI 'C' for platform independence. Algebraic techniques are used to generate and/or modify block face and volume grids to reflect geometric changes resulting from design optimization. Volume grids are generated/modified in a batch environment and controlled via an ASCII user input deck. This allows the code to be incorporated directly into the design loop. Generated volume grids are presented for a High Speed Civil Transport (HSCT) Wing/Body geometry as well a complex HSCT configuration including horizontal and vertical tails, engine nacelles and pylons, and canard surfaces.
Hierarchical vs non-hierarchical audio indexation and classification for video genres
NASA Astrophysics Data System (ADS)
Dammak, Nouha; BenAyed, Yassine
2018-04-01
In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.
Implementation of Hybrid V-Cycle Multilevel Methods for Mixed Finite Element Systems with Penalty
NASA Technical Reports Server (NTRS)
Lai, Chen-Yao G.
1996-01-01
The goal of this paper is the implementation of hybrid V-cycle hierarchical multilevel methods for the indefinite discrete systems which arise when a mixed finite element approximation is used to solve elliptic boundary value problems. By introducing a penalty parameter, the perturbed indefinite system can be reduced to a symmetric positive definite system containing the small penalty parameter for the velocity unknown alone. We stabilize the hierarchical spatial decomposition approach proposed by Cai, Goldstein, and Pasciak for the reduced system. We demonstrate that the relative condition number of the preconditioner is bounded uniformly with respect to the penalty parameter, the number of levels and possible jumps of the coefficients as long as they occur only across the edges of the coarsest elements.
Chen, G.; Chacón, L.
2015-08-11
For decades, the Vlasov–Darwin model has been recognized to be attractive for particle-in-cell (PIC) kinetic plasma simulations in non-radiative electromagnetic regimes, to avoid radiative noise issues and gain computational efficiency. However, the Darwin model results in an elliptic set of field equations that renders conventional explicit time integration unconditionally unstable. We explore a fully implicit PIC algorithm for the Vlasov–Darwin model in multiple dimensions, which overcomes many difficulties of traditional semi-implicit Darwin PIC algorithms. The finite-difference scheme for Darwin field equations and particle equations of motion is space–time-centered, employing particle sub-cycling and orbit-averaging. This algorithm conserves total energy, local charge,more » canonical-momentum in the ignorable direction, and preserves the Coulomb gauge exactly. An asymptotically well-posed fluid preconditioner allows efficient use of large cell sizes, which are determined by accuracy considerations, not stability, and can be orders of magnitude larger than required in a standard explicit electromagnetic PIC simulation. Finally, we demonstrate the accuracy and efficiency properties of the algorithm with various numerical experiments in 2D–3V.« less
Preconditioner Circuit Analysis
2011-09-01
S) Matthew J. Nye 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Postgraduate School Monterey, CA 939435–000 8. PERFORMING... ORGANIZATION REPORT NUMBER 9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) N/A 10. SPONSORING/MONITORING AGENCY REPORT NUMBER 11...of the simulations and the theoretical computations. D. THESIS ORGANIZATION This thesis is organized into four chapters. The theoretical
Multi-color incomplete Cholesky conjugate gradient methods for vector computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poole, E.L.
1986-01-01
This research is concerned with the solution on vector computers of linear systems of equations. Ax = b, where A is a large, sparse symmetric positive definite matrix with non-zero elements lying only along a few diagonals of the matrix. The system is solved using the incomplete Cholesky conjugate gradient method (ICCG). Multi-color orderings are used of the unknowns in the linear system to obtain p-color matrices for which a no-fill block ICCG method is implemented on the CYBER 205 with O(N/p) length vector operations in both the decomposition of A and, more importantly, in the forward and back solvesmore » necessary at each iteration of the method. (N is the number of unknowns and p is a small constant). A p-colored matrix is a matrix that can be partitioned into a p x p block matrix where the diagonal blocks are diagonal matrices. The matrix is stored by diagonals and matrix multiplication by diagonals is used to carry out the decomposition of A and the forward and back solves. Additionally, if the vectors across adjacent blocks line up, then some of the overhead associated with vector startups can be eliminated in the matrix vector multiplication necessary at each conjugate gradient iteration. Necessary and sufficient conditions are given to determine which multi-color orderings of the unknowns correspond to p-color matrices, and a process is indicated for choosing multi-color orderings.« less
Correia, Andrew W; Peters, Junenette L; Levy, Jonathan I; Melly, Steven; Dominici, Francesca
2013-10-08
To investigate whether exposure to aircraft noise increases the risk of hospitalization for cardiovascular diseases in older people (≥ 65 years) residing near airports. Multi-airport retrospective study of approximately 6 million older people residing near airports in the United States. We superimposed contours of aircraft noise levels (in decibels, dB) for 89 airports for 2009 provided by the US Federal Aviation Administration on census block resolution population data to construct two exposure metrics applicable to zip code resolution health insurance data: population weighted noise within each zip code, and 90th centile of noise among populated census blocks within each zip code. 2218 zip codes surrounding 89 airports in the contiguous states. 6 027 363 people eligible to participate in the national medical insurance (Medicare) program (aged ≥ 65 years) residing near airports in 2009. Percentage increase in the hospitalization admission rate for cardiovascular disease associated with a 10 dB increase in aircraft noise, for each airport and on average across airports adjusted by individual level characteristics (age, sex, race), zip code level socioeconomic status and demographics, zip code level air pollution (fine particulate matter and ozone), and roadway density. Averaged across all airports and using the 90th centile noise exposure metric, a zip code with 10 dB higher noise exposure had a 3.5% higher (95% confidence interval 0.2% to 7.0%) cardiovascular hospital admission rate, after controlling for covariates. Despite limitations related to potential misclassification of exposure, we found a statistically significant association between exposure to aircraft noise and risk of hospitalization for cardiovascular diseases among older people living near airports.
Correia, Andrew W; Peters, Junenette L; Levy, Jonathan I; Melly, Steven
2013-01-01
Objective To investigate whether exposure to aircraft noise increases the risk of hospitalization for cardiovascular diseases in older people (≥65 years) residing near airports. Design Multi-airport retrospective study of approximately 6 million older people residing near airports in the United States. We superimposed contours of aircraft noise levels (in decibels, dB) for 89 airports for 2009 provided by the US Federal Aviation Administration on census block resolution population data to construct two exposure metrics applicable to zip code resolution health insurance data: population weighted noise within each zip code, and 90th centile of noise among populated census blocks within each zip code. Setting 2218 zip codes surrounding 89 airports in the contiguous states. Participants 6 027 363 people eligible to participate in the national medical insurance (Medicare) program (aged ≥65 years) residing near airports in 2009. Main outcome measures Percentage increase in the hospitalization admission rate for cardiovascular disease associated with a 10 dB increase in aircraft noise, for each airport and on average across airports adjusted by individual level characteristics (age, sex, race), zip code level socioeconomic status and demographics, zip code level air pollution (fine particulate matter and ozone), and roadway density. Results Averaged across all airports and using the 90th centile noise exposure metric, a zip code with 10 dB higher noise exposure had a 3.5% higher (95% confidence interval 0.2% to 7.0%) cardiovascular hospital admission rate, after controlling for covariates. Conclusions Despite limitations related to potential misclassification of exposure, we found a statistically significant association between exposure to aircraft noise and risk of hospitalization for cardiovascular diseases among older people living near airports. PMID:24103538
Refining the GPS Space Service Volume (SSV) and Building a Multi-GNSS SSV
NASA Technical Reports Server (NTRS)
Parker, Joel J. K.
2017-01-01
The GPS (Global Positioning System) Space Service Volume (SSV) was first defined to protect the GPS main lobe signals from changes from block to block. First developed as a concept by NASA in 2000, it has been adopted for the GPS III block of satellites, and is being used well beyond the current specification to enable increased navigation performance for key missions like GOES-R. NASA has engaged the US IFOR (Interagency Forum Operational Requirements) process to adopt a revised requirement to protect this increased and emerging use. Also, NASA is working through the UN International Committee on GNSS (Global Navigation Satellite System) to develop an interoperable multi-GNSS SSV in partnership with all of the foreign GNSS providers.
An iconic programming language for sensor-based robots
NASA Technical Reports Server (NTRS)
Gertz, Matthew; Stewart, David B.; Khosla, Pradeep K.
1993-01-01
In this paper we describe an iconic programming language called Onika for sensor-based robotic systems. Onika is both modular and reconfigurable and can be used with any system architecture and real-time operating system. Onika is also a multi-level programming environment wherein tasks are built by connecting a series of icons which, in turn, can be defined in terms of other icons at the lower levels. Expert users are also allowed to use control block form to define servo tasks. The icons in Onika are both shape and color coded, like the pieces of a jigsaw puzzle, thus providing a form of error control in the development of high level applications.
Automatic Generation of Cycle-Approximate TLMs with Timed RTOS Model Support
NASA Astrophysics Data System (ADS)
Hwang, Yonghyun; Schirner, Gunar; Abdi, Samar
This paper presents a technique for automatically generating cycle-approximate transaction level models (TLMs) for multi-process applications mapped to embedded platforms. It incorporates three key features: (a) basic block level timing annotation, (b) RTOS model integration, and (c) RTOS overhead delay modeling. The inputs to TLM generation are application C processes and their mapping to processors in the platform. A processor data model, including pipelined datapath, memory hierarchy and branch delay model is used to estimate basic block execution delays. The delays are annotated to the C code, which is then integrated with a generated SystemC RTOS model. Our abstract RTOS provides dynamic scheduling and inter-process communication (IPC) with processor- and RTOS-specific pre-characterized timing. Our experiments using a MP3 decoder and a JPEG encoder show that timed TLMs, with integrated RTOS models, can be automatically generated in less than a minute. Our generated TLMs simulated three times faster than real-time and showed less than 10% timing error compared to board measurements.
1983-09-01
perature level effects the flow of the lignin (naturally found in cork) which acts as a binder. Then the bound cork granules are carbonized in an inert...distribution unlimited 17. DISTRIBUTION STATEMENT ( of the abstract entered In Stock 20. it different Inro Report) Same as block 16 I$. SUPPLEMENTARY NOTES 19...applications and was used to estimate the retardation characteristics in terms of an overall thermal resistance and unit weight. Multi-layer
Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825
NASA Astrophysics Data System (ADS)
Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.
2010-11-01
We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
TopMaker: A Technique for Automatic Multi-Block Topology Generation Using the Medial Axis
NASA Technical Reports Server (NTRS)
Heidmann, James D. (Technical Monitor); Rigby, David L.
2004-01-01
A two-dimensional multi-block topology generation technique has been developed. Very general configurations are addressable by the technique. A configuration is defined by a collection of non-intersecting closed curves, which will be referred to as loops. More than a single loop implies that holes exist in the domain, which poses no problem. This technique requires only the medial vertices and the touch points that define each vertex. From the information about the medial vertices, the connectivity between medial vertices is generated. The physical shape of the medial edge is not required. By applying a few simple rules to each medial edge, the multiblock topology is generated with no user intervention required. The resulting topologies contain only the level of complexity dictated by the configurations. Grid lines remain attached to the boundary except at sharp concave turns where a change in index family is introduced as would be desired. Keeping grid lines attached to the boundary is especially important in the area of computational fluid dynamics where highly clustered grids are used near no-slip boundaries. This technique is simple and robust and can easily be incorporated into the overall grid generation process.
Modelling of the Vajont rockslide displacements by delayed plasticity of interacting sliding blocks
NASA Astrophysics Data System (ADS)
Castellanza, riccardo; Hedge, Amarnath; Crosta, Giovanni; di Prisco, Claudio; Frigerio, Gabriele
2015-04-01
In order to model complex sliding masses subject to continuous slow movements related to water table fluctuations it is convenient to: i) model the time-dependent mechanical behaviour of the materials by means of a viscous-plastic constitutive law; ii) assume the water table fluctuation as the main input to induce displacement acceleration; iii) consider, the 3D constrains by maintaining a level of simplicity such to allow the implementation into EWS (Early Warning System) for risk management. In this work a 1D pseudo-dynamic visco-plastic model (Secondi et al. 2011), based on Perzyna's delayed plasticity theory is applied. The sliding mass is considered as a rigid block subject to its self weight, inertial forces and seepage forces varying with time. All non-linearities are lumped in a thin layer positioned between the rigid block and the stable bedrock. The mechanical response of this interface is assumed to be visco-plastic. The viscous nucleus is assumed to be of the exponential type, so that irreversible strains develop for both positive and negative values of the yield function; the sliding mass is discretized in blocks to cope with complex rockslide geometries; the friction angle is assumed to reduce with strain rate assuming a sort of strain - rate law (Dietrich-Ruina law). To validate the improvements introduced in this paper the simulation of the displacements of the Vajont rockslide from 1960 to the failure, occurred on October the 9th 1963, is perfomed. It will be shown that, in its modified version, the model satisfactorily fits the Vajont pre-collapse displacements triggered by the fluctuation of the Vajont lake level and the associated groundwater level. The model is able to follow the critical acceleration of the motion with a minimal change in friction properties.The discretization in interacting sliding blocks confirms its suitability to model the complex 3D rockslide behaviour. We are currently implementing a multi-block model capable to include the mutual influence of multiple blocks, characterized by different geometry and groundwater levels, shear zone properties and type of interconnection. Secondi M., Crosta G., Di Prisco C., Frigerio G., Frattini P., Agliardi F. (2011) "Landslide motion forecasting by a dynamic visco-plastic model", Proc. The Second World Landslide Forum, L09 - Advances in slope modelling, Rome, 3-9 October 2011, paper WLF2-2011-0571
Guggenheim, S. Frederic
1986-01-01
A multi-port fluid valve apparatus is used to control the flow of fluids through a plurality of valves and includes a web, which preferably is a stainless steel endless belt. The belt has an aperture therethrough and is progressed, under motor drive and control, so that its aperture is moved from one valve mechanism to another. Each of the valve mechanisms comprises a pair of valve blocks which are held in fluid-tight relationship against the belt. Each valve block consists of a block having a bore through which the fluid flows, a first seal surrounding the bore and a second seal surrounding the first seal, with the distance between the first and second seals being greater than the size of the belt aperture. In order to open a valve, the motor progresses the belt aperture to where it is aligned with the two bores of a pair of valve blocks, such alignment permitting a flow of the fluid through the valve. The valve is closed by movement of the belt aperture and its replacement, within the pair of valve blocks, by a solid portion of the belt.
A Multi-Scale Settlement Matching Algorithm Based on ARG
NASA Astrophysics Data System (ADS)
Yue, Han; Zhu, Xinyan; Chen, Di; Liu, Lingjia
2016-06-01
Homonymous entity matching is an important part of multi-source spatial data integration, automatic updating and change detection. Considering the low accuracy of existing matching methods in dealing with matching multi-scale settlement data, an algorithm based on Attributed Relational Graph (ARG) is proposed. The algorithm firstly divides two settlement scenes at different scales into blocks by small-scale road network and constructs local ARGs in each block. Then, ascertains candidate sets by merging procedures and obtains the optimal matching pairs by comparing the similarity of ARGs iteratively. Finally, the corresponding relations between settlements at large and small scales are identified. At the end of this article, a demonstration is presented and the results indicate that the proposed algorithm is capable of handling sophisticated cases.
Li, Jun-Ying; Hu, Yuan-Man; Chen, Wei; Liu, Miao; Hu, Jian-Bo; Zhong, Qiao-Lin; Lu, Ning
2012-06-01
Population is the most active factor affecting city development. To understand the distribution characteristics of urban population is of significance for making city policy decisions and for optimizing the layout of various urban infrastructures. In this paper, the information of the residential buildings in Shenyang urban area was extracted from the QuickBird remote sensing images, and the spatial distribution characteristics of the population within the Third-Ring Road of the City were analyzed, according to the social and economic statistics data. In 2010, the population density in different types of residential buildings within the Third-Ring Road of the City decreased in the order of high-storey block, mixed block, mixed garden, old multi-storey building, high-storey garden, multi-storey block, multi-storey garden, villa block, shanty, and villa garden. The vacancy rate of the buildings within the Third-Ring Road was more than 30%, meaning that the real estate market was seriously overstocked. Among the five Districts of Shenyang City, Shenhe District had the highest potential population density, while Tiexi District and Dadong District had a lower one. The gravity center of the City and its five Districts was also analyzed, which could provide basic information for locating commercial facilities and planning city infrastructure.
Fowler, Christopher S.
2015-01-01
Neighborhoods and neighborhood change are often at least implicitly understood in relation to processes taking place at scales both smaller than and larger than the neighborhood itself. Until recently our capacity to represent these multi-scalar processes with quantitative measures has been limited. Recent work on “segregation profiles” by Reardon and collaborators (Reardon et al., 2008, 2009) expands our capacity to explore the relationship between population measures and scale. With the methodological tools now available, we need a conceptual shift in how we view population measures in order to bring our theories and measures of neighborhoods into alignment. I argue that segregation can be beneficially viewed as multi-scalar; not a value calculable at some ‘correct’ scale, but a continuous function with respect to scale. This shift requires new ways of thinking about and analyzing segregation with respect to scale that engage with the complexity of the multi-scalar measure. Using block level data for eight neighborhoods in Seattle, Washington I explore the implications of a multi-scalar segregation measure for understanding neighborhoods and neighborhood change from 1990 to 2010. PMID:27041785
Fowler, Christopher S
Neighborhoods and neighborhood change are often at least implicitly understood in relation to processes taking place at scales both smaller than and larger than the neighborhood itself. Until recently our capacity to represent these multi-scalar processes with quantitative measures has been limited. Recent work on "segregation profiles" by Reardon and collaborators (Reardon et al., 2008, 2009) expands our capacity to explore the relationship between population measures and scale. With the methodological tools now available, we need a conceptual shift in how we view population measures in order to bring our theories and measures of neighborhoods into alignment. I argue that segregation can be beneficially viewed as multi-scalar ; not a value calculable at some 'correct' scale, but a continuous function with respect to scale. This shift requires new ways of thinking about and analyzing segregation with respect to scale that engage with the complexity of the multi-scalar measure. Using block level data for eight neighborhoods in Seattle, Washington I explore the implications of a multi-scalar segregation measure for understanding neighborhoods and neighborhood change from 1990 to 2010.
2013-01-01
intelligently selecting waveform parameters using adaptive algorithms. The adaptive algorithms optimize the waveform parameters based on (1) the EM...the environment. 15. SUBJECT TERMS cognitive radar, adaptive sensing, spectrum sensing, multi-objective optimization, genetic algorithms, machine...detection and classification block diagram. .........................................................6 Figure 5. Genetic algorithm block diagram
Inversion of potential field data using the finite element method on parallel computers
NASA Astrophysics Data System (ADS)
Gross, L.; Altinay, C.; Shaw, S.
2015-11-01
In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.
A model reduction approach to numerical inversion for a parabolic partial differential equation
NASA Astrophysics Data System (ADS)
Borcea, Liliana; Druskin, Vladimir; Mamonov, Alexander V.; Zaslavsky, Mikhail
2014-12-01
We propose a novel numerical inversion algorithm for the coefficients of parabolic partial differential equations, based on model reduction. The study is motivated by the application of controlled source electromagnetic exploration, where the unknown is the subsurface electrical resistivity and the data are time resolved surface measurements of the magnetic field. The algorithm presented in this paper considers inversion in one and two dimensions. The reduced model is obtained with rational interpolation in the frequency (Laplace) domain and a rational Krylov subspace projection method. It amounts to a nonlinear mapping from the function space of the unknown resistivity to the small dimensional space of the parameters of the reduced model. We use this mapping as a nonlinear preconditioner for the Gauss-Newton iterative solution of the inverse problem. The advantage of the inversion algorithm is twofold. First, the nonlinear preconditioner resolves most of the nonlinearity of the problem. Thus the iterations are less likely to get stuck in local minima and the convergence is fast. Second, the inversion is computationally efficient because it avoids repeated accurate simulations of the time-domain response. We study the stability of the inversion algorithm for various rational Krylov subspaces, and assess its performance with numerical experiments.
NASA Astrophysics Data System (ADS)
Kong, Fande; Cai, Xiao-Chuan
2017-07-01
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear in many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here "geometry" includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dennis C. Smolarski, S.J.
Project Abstract This project was a continuation of work begun under a subcontract issued off of TSI-DOE Grant 1528746, awarded to the University of Illinois Urbana-Champaign. Dr. Anthony Mezzacappa is the Principal Investigator on the Illinois award. A separate award was issued to Santa Clara University to continue the collaboration during the time period May 2003 ? 2004. Smolarski continued to work on preconditioner technology and its interface with various iterative methods. He worked primarily with F. Dough Swesty (SUNY-Stony Brook) in continuing software development started in the 2002-03 academic year. Special attention was paid to the development and testingmore » of difference sparse approximate inverse preconditioners and their use in the solution of linear systems arising from radiation transport equations. The target was a high performance platform on which efficient implementation is a critical component of the overall effort. Smolarski also focused on the integration of the adaptive iterative algorithm, Chebycode, developed by Tom Manteuffel and Steve Ashby and adapted by Ryan Szypowski for parallel platforms, into the radiation transport code being developed at SUNY-Stony Brook.« less
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, P. T.; Shadid, J. N.; Hu, J. J.
Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD
Lin, P. T.; Shadid, J. N.; Hu, J. J.; ...
2017-11-06
Here, we explore the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. Our study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of themore » original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.« less
Real-time simulation of contact and cutting of heterogeneous soft-tissues.
Courtecuisse, Hadrien; Allard, Jérémie; Kerfriden, Pierre; Bordas, Stéphane P A; Cotin, Stéphane; Duriez, Christian
2014-02-01
This paper presents a numerical method for interactive (real-time) simulations, which considerably improves the accuracy of the response of heterogeneous soft-tissue models undergoing contact, cutting and other topological changes. We provide an integrated methodology able to deal both with the ill-conditioning issues associated with material heterogeneities, contact boundary conditions which are one of the main sources of inaccuracies, and cutting which is one of the most challenging issues in interactive simulations. Our approach is based on an implicit time integration of a non-linear finite element model. To enable real-time computations, we propose a new preconditioning technique, based on an asynchronous update at low frequency. The preconditioner is not only used to improve the computation of the deformation of the tissues, but also to simulate the contact response of homogeneous and heterogeneous bodies with the same accuracy. We also address the problem of cutting the heterogeneous structures and propose a method to update the preconditioner according to the topological modifications. Finally, we apply our approach to three challenging demonstrators: (i) a simulation of cataract surgery (ii) a simulation of laparoscopic hepatectomy (iii) a brain tumor surgery. Copyright © 2013 Elsevier B.V. All rights reserved.
Kong, Fande; Cai, Xiao-Chuan
2017-03-24
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexactmore » Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here ''geometry'' includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.« less
Recent advances in nonlinear implicit, electrostatic particle-in-cell (PIC) algorithms
NASA Astrophysics Data System (ADS)
Chen, Guangye; Chacón, Luis; Barnes, Daniel
2012-10-01
An implicit 1D electrostatic PIC algorithmfootnotetextChen, Chac'on, Barnes, J. Comput. Phys. 230 (2011) has been developed that satisfies exact energy and charge conservation. The algorithm employs a kinetic-enslaved Jacobian-free Newton-Krylov methodfootnotetextIbid. that ensures nonlinear convergence while taking timesteps comparable to the dynamical timescale of interest. Here we present two main improvements of the algorithm. The first is the formulation of a preconditioner based on linearized fluid equations, which are closed using available particle information. The computational benefit is that solving the fluid system is much cheaper than the kinetic one. The effectiveness of the preconditioner in accelerating nonlinear iterations on challenging problems will be demonstrated. A second improvement is the generalization of Ref. 1 to curvilinear meshes,footnotetextChac'on, Chen, Barnes, J. Comput. Phys. submitted (2012) with a hybrid particle update of positions and velocities in logical and physical space respectively.footnotetextSwift, J. Comp. Phys., 126 (1996) The curvilinear algorithm remains exactly charge and energy-conserving, and can be extended to multiple dimensions. We demonstrate the accuracy and efficiency of the algorithm with a 1D ion-acoustic shock wave simulation.
Comby, G.
1996-10-01
The Ceramic Electron Multipliers (CEM) is a compact, robust, linear and fast multi-channel electron multiplier. The Multi Layer Ceramic Technique (MLCT) allows to build metallic dynodes inside a compact ceramic block. The activation of the metallic dynodes enhances their secondary electron emission (SEE). The CEM can be used in multi-channel photomultipliers, multi-channel light intensifiers, ion detection, spectroscopy, analysis of time of flight events, particle detection or Cherenkov imaging detectors. (auth)
Modeling and Grid Generation of Iced Airfoils
NASA Technical Reports Server (NTRS)
Vickerman, Mary B.; Baez, Marivell; Braun, Donald C.; Hackenberg, Anthony W.; Pennline, James A.; Schilling, Herbert W.
2007-01-01
SmaggIce Version 2.0 is a software toolkit for geometric modeling and grid generation for two-dimensional, singleand multi-element, clean and iced airfoils. A previous version of SmaggIce was described in Preparing and Analyzing Iced Airfoils, NASA Tech Briefs, Vol. 28, No. 8 (August 2004), page 32. To recapitulate: Ice shapes make it difficult to generate quality grids around airfoils, yet these grids are essential for predicting ice-induced complex flow. This software efficiently creates high-quality structured grids with tools that are uniquely tailored for various ice shapes. SmaggIce Version 2.0 significantly enhances the previous version primarily by adding the capability to generate grids for multi-element airfoils. This version of the software is an important step in streamlining the aeronautical analysis of ice airfoils using computational fluid dynamics (CFD) tools. The user may prepare the ice shape, define the flow domain, decompose it into blocks, generate grids, modify/divide/merge blocks, and control grid density and smoothness. All these steps may be performed efficiently even for the difficult glaze and rime ice shapes. Providing the means to generate highly controlled grids near rough ice, the software includes the creation of a wrap-around block (called the "viscous sublayer block"), which is a thin, C-type block around the wake line and iced airfoil. For multi-element airfoils, the software makes use of grids that wrap around and fill in the areas between the viscous sub-layer blocks for all elements that make up the airfoil. A scripting feature records the history of interactive steps, which can be edited and replayed later to produce other grids. Using this version of SmaggIce, ice shape handling and grid generation can become a practical engineering process, rather than a laborious research effort.
NASA Technical Reports Server (NTRS)
Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Chirstopher O.; Kleb, Bil
2010-01-01
This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem-dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the FUN3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
NASA Technical Reports Server (NTRS)
Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, William L.
2013-01-01
This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintain ability by eliminating the requirement for problem dependent recompilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the Fun3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
NASA Technical Reports Server (NTRS)
Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, Bil
2011-01-01
This users manual provides in-depth information concerning installation and execution of Laura, version 5. Laura is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 Laura code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, Laura now shares gas-physics modules, MPI modules, and other low-level modules with the Fun3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
van Eldijk, Mark B; Schoonen, Lise; Cornelissen, Jeroen J L M; Nolte, Roeland J M; van Hest, Jan C M
2016-05-01
Protein cages are an interesting class of biomaterials with potential applications in bionanotechnology. Therefore, substantial effort is spent on the development of capsule-forming designer polypeptides with a tailor-made assembly profile. The expanded assembly profile of a triblock copolypeptide consisting of a metal ion chelating hexahistidine-tag, a stimulus-responsive elastin-like polypeptide block, and a pH-responsive morphology-controlling viral capsid protein is presented. The self-assembly of this multi-responsive protein-based block copolymer is triggered by the addition of divalent metal ions. This assembly process yields monodisperse nanocapsules with a 20 nm diameter composed of 60 polypeptides. The well-defined nanoparticles are the result of the emergent properties of all the blocks of the polypeptide. These results demonstrate the feasibility of hexahistidine-tags to function as supramolecular cross-linkers. Furthermore, their potential for the metal ion-mediated encapsulation of hexahistidine-tagged proteins is shown. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hierarchical patch-based co-registration of differently stained histopathology slides
NASA Astrophysics Data System (ADS)
Yigitsoy, Mehmet; Schmidt, Günter
2017-03-01
Over the past decades, digital pathology has emerged as an alternative way of looking at the tissue at subcellular level. It enables multiplexed analysis of different cell types at micron level. Information about cell types can be extracted by staining sections of a tissue block using different markers. However, robust fusion of structural and functional information from different stains is necessary for reproducible multiplexed analysis. Such a fusion can be obtained via image co-registration by establishing spatial correspondences between tissue sections. Spatial correspondences can then be used to transfer various statistics about cell types between sections. However, the multi-modal nature of images and sparse distribution of interesting cell types pose several challenges for the registration of differently stained tissue sections. In this work, we propose a co-registration framework that efficiently addresses such challenges. We present a hierarchical patch-based registration of intensity normalized tissue sections. Preliminary experiments demonstrate the potential of the proposed technique for the fusion of multi-modal information from differently stained digital histopathology sections.
Test systems of the STS-XYTER2 ASIC: from wafer-level to in-system verification
NASA Astrophysics Data System (ADS)
Kasinski, Krzysztof; Zubrzycka, Weronika
2016-09-01
The STS/MUCH-XYTER2 ASIC is a full-size prototype chip for the Silicon Tracking System (STS) and Muon Chamber (MUCH) detectors in the new fixed-target experiment Compressed Baryonic Matter (CBM) at FAIR-center, Darmstadt, Germany. The STS assembly includes more than 14000 ASICs. The complicated, time-consuming, multi-step assembly process of the detector building blocks and tight quality assurance requirements impose several intermediate testing to be performed for verifying crucial assembly steps (e.g. custom microcable tab-bonding before wire-bonding to the PCB) and - if necessary - identifying channels or modules for rework. The chip supports the multi-level testing with different probing / contact methods (wafer probe-card, pogo-probes, in-system tests). A huge number of ASICs to be tested restricts the number and kind of tests possible to be performed within a reasonable time. The proposed architectures of test stand equipment and a brief summary of methodologies are presented in this paper.
Hermans, Erno J; Kanen, Jonathan W; Tambini, Arielle; Fernández, Guillén; Davachi, Lila; Phelps, Elizabeth A
2017-05-01
After encoding, memories undergo a process of consolidation that determines long-term retention. For conditioned fear, animal models postulate that consolidation involves reactivations of neuronal assemblies supporting fear learning during postlearning "offline" periods. However, no human studies to date have investigated such processes, particularly in relation to long-term expression of fear. We tested 24 participants using functional MRI on 2 consecutive days in a fear conditioning paradigm involving 1 habituation block, 2 acquisition blocks, and 2 extinction blocks on day 1, and 2 re-extinction blocks on day 2. Conditioning blocks were preceded and followed by 4.5-min rest blocks. Strength of spontaneous recovery of fear on day 2 served as a measure of long-term expression of fear. Amygdala connectivity primarily with hippocampus increased progressively during postacquisition and postextinction rest on day 1. Intraregional multi-voxel correlation structures within amygdala and hippocampus sampled during a block of differential fear conditioning furthermore persisted after fear learning. Critically, both these main findings were stronger in participants who exhibited spontaneous recovery 24 h later. Our findings indicate that neural circuits activated during fear conditioning exhibit persistent postlearning activity that may be functionally relevant in promoting consolidation of the fear memory. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Asgharzadeh, Hafez; Borazjani, Iman
2017-02-15
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 - 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
Asgharzadeh, Hafez; Borazjani, Iman
2016-01-01
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 – 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80–90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future. PMID:28042172
NASA Astrophysics Data System (ADS)
Asgharzadeh, Hafez; Borazjani, Iman
2017-02-01
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for non-linear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form a preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42-74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal and full Jacobian, respectivley, when the stretching factor was increased. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
Numerical simulation of rough-surface aerodynamics
NASA Astrophysics Data System (ADS)
Chi, Xingkai
Computational fluid dynamics (CFD) simulations of flow over surfaces with roughness in which the details of the surface geometry must be resolved pose major challenges. The objective of this study is to address these challenges through two important engineering problems, where roughness play a critical role---flow over airfoils with accrued ice and flow and heat transfer over turbine blade surfaces roughened by erosion and/or deposition. CFD simulations of iced airfoils face two major challenges. The first is how to generate high-quality single- and multi-block structured grids for highly convoluted convex and concave surface geometries with multiple scales. In this study, two methods were developed for the generation of high-quality grids for such geometries. The method developed for single-block grids involves generating a grid about the clean airfoil, carving out a portion of that grid about the airfoil, replacing that portion with a grid that accounts for the accrued ice geometry, and performing elliptic smoothing. The method developed for multi-block grids involves a transition-layer grid to ensure jaggedness in the ice geometry does not propagate into the domain. It also involves a "thick" wrap-around grid about the ice to ensure grid lines clustered next to solid surfaces do not propagate as streaks of tightly packed grid lines into the domain along block boundaries. For multi-block grids, this study also developed blocking topologies that ensure solutions to multi-block grids converge to steady state as quickly as single-block grids. The second major challenge in CFD simulations of iced airfoils is not knowing when it will predict reliably because of uncertainties in the turbulence modeling. In this study, the effects of turbulence models in predicting lift, drag, and moment coefficients were examined for airfoils with rime ice (i.e., ice with jaggedness only) and with glaze ice (i.e., ice with multiple protruding horns and surface jaggedness) as a function of angle of attack. In this examination, three different CFD codes---WIND, FLUENT, and PowerFLOW were used to examine a variety of turbulence models, including Spalart-Allmaras, RNG k-epsilon, shear-stress transport, v2-f, and differential Reynolds stress with and without non-equilibrium wall functions. The accuracy of the CFD predictions was evaluated by comparing grid-independent solutions with measured experimental data. Results obtained show CFD with WIND and FLUENT to predict the aerodynamics of airfoils with rime ice reliably up to near stall for all turbulence models investigated. (Abstract shortened by UMI.)
NASA Astrophysics Data System (ADS)
Paulsen, Lee; Hoffmann, Ted; Fulton, Caleb; Yeary, Mark; Saunders, Austin; Thompson, Dan; Chen, Bill; Guo, Alex; Murmann, Boris
2015-05-01
Phased array systems offer numerous advantages to the modern warfighter in multiple application spaces, including Radar, Electronic Warfare, Signals Intelligence, and Communications. However, a lack of commonality in the underlying technology base for DoD Phased Arrays has led to static systems with long development cycles, slow technology refreshes in response to emerging threats, and expensive, application-specific sub-components. The IMPACT module (Integrated Multi-use Phased Array Common Tile) is a multi-channel, reconfigurable, cost-effective beamformer that provides a common building block for multiple, disparate array applications.
NASA Technical Reports Server (NTRS)
Luo, Victor; Khanampornpan, Teerapat; Boehmer, Rudy A.; Kim, Rachel Y.
2011-01-01
This software graphically displays all pertinent information from a Predicted Events File (PEF) using the Java Swing framework, which allows for multi-platform support. The PEF is hard to weed through when looking for specific information and it is a desire for the MRO (Mars Reconn aissance Orbiter) Mission Planning & Sequencing Team (MPST) to have a different way to visualize the data. This tool will provide the team with a visual way of reviewing and error-checking the sequence product. The front end of the tool contains much of the aesthetically appealing material for viewing. The time stamp is displayed in the top left corner, and highlighted details are displayed in the bottom left corner. The time bar stretches along the top of the window, and the rest of the space is allotted for blocks and step functions. A preferences window is used to control the layout of the sections along with the ability to choose color and size of the blocks. Double-clicking on a block will show information contained within the block. Zooming into a certain level will graphically display that information as an overlay on the block itself. Other functions include using hotkeys to navigate, an option to jump to a specific time, enabling a vertical line, and double-clicking to zoom in/out. The back end involves a configuration file that allows a more experienced user to pre-define the structure of a block, a single event, or a step function. The individual will have to determine what information is important within each block and what actually defines the beginning and end of a block. This gives the user much more flexibility in terms of what the tool is searching for. In addition to the configurability, all the settings in the preferences window are saved in the configuration file as well
Synergistic Anti-arrhythmic Effects in Human Atria with Combined Use of Sodium Blockers and Acacetin
Ni, Haibo; Whittaker, Dominic G.; Wang, Wei; Giles, Wayne R.; Narayan, Sanjiv M.; Zhang, Henggui
2017-01-01
Atrial fibrillation (AF) is the most common cardiac arrhythmia. Developing effective and safe anti-AF drugs remains an unmet challenge. Simultaneous block of both atrial-specific ultra-rapid delayed rectifier potassium (K+) current (IKur) and the Na+ current (INa) has been hypothesized to be anti-AF, without inducing significant QT prolongation and ventricular side effects. However, the antiarrhythmic advantage of simultaneously blocking these two channels vs. individual block in the setting of AF-induced electrical remodeling remains to be documented. Furthermore, many IKur blockers such as acacetin and AVE0118, partially inhibit other K+ currents in the atria. Whether this multi-K+-block produces greater anti-AF effects compared with selective IKur-block has not been fully understood. The aim of this study was to use computer models to (i) assess the impact of multi-K+-block as exhibited by many IKur blokers, and (ii) evaluate the antiarrhythmic effect of blocking IKur and INa, either alone or in combination, on atrial and ventricular electrical excitation and recovery in the setting of AF-induced electrical-remodeling. Contemporary mathematical models of human atrial and ventricular cells were modified to incorporate dose-dependent actions of acacetin (a multichannel blocker primarily inhibiting IKur while less potently blocking Ito, IKr, and IKs). Rate- and atrial-selective inhibition of INa was also incorporated into the models. These single myocyte models were then incorporated into multicellular two-dimensional (2D) and three-dimensional (3D) anatomical models of the human atria. As expected, application of IKur blocker produced pronounced action potential duration (APD) prolongation in atrial myocytes. Furthermore, combined multiple K+-channel block that mimicked the effects of acacetin exhibited synergistic APD prolongations. Synergistically anti-AF effects following inhibition of INa and combined IKur/K+-channels were also observed. The attainable maximal AF-selectivity of INa inhibition was greatly augmented by blocking IKur or multiple K+-currents in the atrial myocytes. This enhanced anti-arrhythmic effects of combined block of Na+- and K+-channels were also seen in 2D and 3D simulations; specially, there was an enhanced efficacy in terminating re-entrant excitation waves, exerting improved antiarrhythmic effects in the human atria as compared to a single-channel block. However, in the human ventricular myocytes and tissue, cellular repolarization and computed QT intervals were modestly affected in the presence of actions of acacetin and INa blockers (either alone or in combination). In conclusion, this study demonstrates synergistic antiarrhythmic benefits of combined block of IKur and INa, as well as those of INa and combined multi K+-current block of acacetin, without significant alterations of ventricular repolarization and QT intervals. This approach may be a valuable strategy for the treatment of AF. PMID:29218016
M-step preconditioned conjugate gradient methods
NASA Technical Reports Server (NTRS)
Adams, L.
1983-01-01
Preconditioned conjugate gradient methods for solving sparse symmetric and positive finite systems of linear equations are described. Necessary and sufficient conditions are given for when these preconditioners can be used and an analysis of their effectiveness is given. Efficient computer implementations of these methods are discussed and results on the CYBER 203 and the Finite Element Machine under construction at NASA Langley Research Center are included.
Parallel Performance of Linear Solvers and Preconditioners
2014-01-01
are produced by a discrete dislocation dynamics ( DDD ) simulation and change with each timestep of the DDD simulation as the dislocation structure...evolves. However, the coefficient—or stiffness matrix— remains constant during the DDD simulation and some expensive matrix factorizations only occur once...discrete dislocation dynamics ( DDD ) simulations. This can be achieved by coupling a DDD simulator for bulk material (Arsenlis et al., 2007) to a
Xu, Enhua; Li, Shuhua
2013-11-07
The block correlated second-order perturbation theory with a generalized valence bond (GVB) reference (GVB-BCPT2) is proposed. In this approach, each geminal in the GVB reference is considered as a "multi-orbital" block (a subset of spin orbitals), and each occupied or virtual spin orbital is also taken as a single block. The zeroth-order Hamiltonian is set to be the summation of the individual Hamiltonians of all blocks (with explicit two-electron operators within each geminal) so that the GVB reference function and all excited configuration functions are its eigenfunctions. The GVB-BCPT2 energy can be directly obtained without iteration, just like the second order Mo̸ller-Plesset perturbation method (MP2), both of which are size consistent. We have applied this GVB-BCPT2 method to investigate the equilibrium distances and spectroscopic constants of 7 diatomic molecules, conformational energy differences of 8 small molecules, and bond-breaking potential energy profiles in 3 systems. GVB-BCPT2 is demonstrated to have noticeably better performance than MP2 for systems with significant multi-reference character, and provide reasonably accurate results for some systems with large active spaces, which are beyond the capability of all CASSCF-based methods.
NASA Astrophysics Data System (ADS)
Liu, Leibo; Chen, Yingjie; Yin, Shouyi; Lei, Hao; He, Guanghui; Wei, Shaojun
2014-07-01
A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.
Efficient Iterative Methods Applied to the Solution of Transonic Flows
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.
1996-02-01
We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
Multi-color incomplete Cholesky conjugate gradient methods for vector computers. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Poole, E. L.
1986-01-01
In this research, we are concerned with the solution on vector computers of linear systems of equations, Ax = b, where A is a larger, sparse symmetric positive definite matrix. We solve the system using an iterative method, the incomplete Cholesky conjugate gradient method (ICCG). We apply a multi-color strategy to obtain p-color matrices for which a block-oriented ICCG method is implemented on the CYBER 205. (A p-colored matrix is a matrix which can be partitioned into a pXp block matrix where the diagonal blocks are diagonal matrices). This algorithm, which is based on a no-fill strategy, achieves O(N/p) length vector operations in both the decomposition of A and in the forward and back solves necessary at each iteration of the method. We discuss the natural ordering of the unknowns as an ordering that minimizes the number of diagonals in the matrix and define multi-color orderings in terms of disjoint sets of the unknowns. We give necessary and sufficient conditions to determine which multi-color orderings of the unknowns correpond to p-color matrices. A performance model is given which is used both to predict execution time for ICCG methods and also to compare an ICCG method to conjugate gradient without preconditioning or another ICCG method. Results are given from runs on the CYBER 205 at NASA's Langley Research Center for four model problems.
Multi-DSP and FPGA based Multi-channel Direct IF/RF Digital receiver for atmospheric radar
NASA Astrophysics Data System (ADS)
Yasodha, Polisetti; Jayaraman, Achuthan; Kamaraj, Pandian; Durga rao, Meka; Thriveni, A.
2016-07-01
Modern phased array radars depend highly on digital signal processing (DSP) to extract the echo signal information and to accomplish reliability along with programmability and flexibility. The advent of ASIC technology has made various digital signal processing steps to be realized in one DSP chip, which can be programmed as per the application and can handle high data rates, to be used in the radar receiver to process the received signal. Further, recent days field programmable gate array (FPGA) chips, which can be re-programmed, also present an opportunity to utilize them to process the radar signal. A multi-channel direct IF/RF digital receiver (MCDRx) is developed at NARL, taking the advantage of high speed ADCs and high performance DSP chips/FPGAs, to be used for atmospheric radars working in HF/VHF bands. Multiple channels facilitate the radar t be operated in multi-receiver modes and also to obtain the wind vector with improved time resolution, without switching the antenna beam. MCDRx has six channels, implemented on a custom built digital board, which is realized using six numbers of ADCs for simultaneous processing of the six input signals, Xilinx vertex5 FPGA and Spartan6 FPGA, and two ADSPTS201 DSP chips, each of which performs one phase of processing. MCDRx unit interfaces with the data storage/display computer via two gigabit ethernet (GbE) links. One of the six channels is used for Doppler beam swinging (DBS) mode and the other five channels are used for multi-receiver mode operations, dedicatedly. Each channel has (i) ADC block, to digitize RF/IF signal, (ii) DDC block for digital down conversion of the digitized signal, (iii) decoding block to decode the phase coded signal, and (iv) coherent integration block for integrating the data preserving phase intact. ADC block consists of Analog devices make AD9467 16-bit ADCs, to digitize the input signal at 80 MSPS. The output of ADC is centered around (80 MHz - input frequency). The digitized data is fed to DDC block, which down converts the data to base-band. The DDC block has NCO, mixer and two chains of Bessel filters (fifth order cascaded integration comb filter, two FIR filters, two half band filters and programmable FIR filters) for in-phase (I) and Quadrature phase (Q) channels. The NCO has 32 bits and is set to match the output frequency of ADC. Further, DDC down samples (decimation) the data and reduces the data rate to 16 MSPS. This data is further decimated and the data rate is reduced down to 4/2/1/0.5/0.25/0.125/0.0625 MSPS for baud lengths 0.25/0.5/1/2/4/8/16 μs respectively. The down sampled data is then fed to decoding block, which performs cross correlation to achieve pulse compression of the binary-phase coded data to obtain better range resolution with maximum possible height coverage. This step improves the signal power by a factor equal to the length of the code. Coherent integration block integrates the decoded data coherently for successive pulses, which improves the signal to noise ratio and reduces the data volume. DDC, decoding and coherent integration blocks are implemented in Xilinx vertex5 FPGA. Till this point, function of all six channels is same for DBS mode and multi-receiver modes. Data from vertex5 FPGA is transferred to PC via GbE-1 interface for multi-modes or to two Analog devices make ADSP-TS201 DSP chips (A and B), via link port for DBS mode. ADSP-TS201 chips perform the normalization, DC removal, windowing, FFT computation and spectral averaging on the data, which is transferred to storage/display PC via GbE-2 interface for real-time data display and data storing. Physical layer of GbE interface is implemented in an external chip (Marvel 88E1111) and MAC layer is implemented internal to vertex5 FPGA. The MCDRx has total 4 GB of DDR2 memory for data storage. Spartan6 FPGA is used for generating timing signals, required for basic operation of the radar and testing of the MCDRx.
Patel, Sarthak K; Lavasanifar, Afsaneh; Choi, Phillip
2010-03-01
Molecular dynamics simulation was used to study the potential of using a block copolymer containing three poly(epsilon-caprolactone) (PCL) blocks of equal length connected to one end of a poly(ethylene oxide) (PEO) block, designated as PEO-b-3PCL, to encapsulate two classes of hydrophobic drugs with distinctively different molecular structures. In particular, the first class of drugs consisted of two cucurbitacin drugs (CuB and CuI) that contain multiple hydrogen bond donors and acceptors evenly distributed on their molecules while the other class of drugs (fenofibrate and nimodipine) contain essentially only clustered hydrogen bond acceptors. In the case of cucurbitacin drugs, the results showed that PEO-b-3PCL lowered the Flory-Huggins interaction parameters (chi) considerably (i.e., increased the drug solubility) compared to the linear di-block copolymer PEO-b-PCL with the same PCL/PEO (w/w) ratio of 1.0. However, the opposite effect was observed for fenofibrate and nimodipine. Analysis of the intermolecular interactions indicates that the number of hydrogen bonds formed between the three PCL blocks and cucurbitacin drugs is significantly higher than that of the linear di-block copolymer. On the other hand, owing to the absence of hydrogen bond donors and the clustering of the hydrogen bond acceptors on the fenofibrate and nimodipine molecules, this significantly reduces the number of hydrogen bonds formed in the multi-PCL block environment, leading to unfavourable chi values. The findings of the present work suggest that multi-hydrophobic block architecture could potentially increase the drug loading for hydrophobic drugs with structures containing evenly distributed multiple hydrogen bond donors and acceptors. (c) 2009 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Corona, Enrique; Nutter, Brian; Mitra, Sunanda; Guo, Jiangling; Karp, Tanja
2008-03-01
Efficient retrieval of high quality Regions-Of-Interest (ROI) from high resolution medical images is essential for reliable interpretation and accurate diagnosis. Random access to high quality ROI from codestreams is becoming an essential feature in many still image compression applications, particularly in viewing diseased areas from large medical images. This feature is easier to implement in block based codecs because of the inherent spatial independency of the code blocks. This independency implies that the decoding order of the blocks is unimportant as long as the position for each is properly identified. In contrast, wavelet-tree based codecs naturally use some interdependency that exploits the decaying spectrum model of the wavelet coefficients. Thus one must keep track of the decoding order from level to level with such codecs. We have developed an innovative multi-rate image subband coding scheme using "Backward Coding of Wavelet Trees (BCWT)" which is fast, memory efficient, and resolution scalable. It offers far less complexity than many other existing codecs including both, wavelet-tree, and block based algorithms. The ROI feature in BCWT is implemented through a transcoder stage that generates a new BCWT codestream containing only the information associated with the user-defined ROI. This paper presents an efficient technique that locates a particular ROI within the BCWT coded domain, and decodes it back to the spatial domain. This technique allows better access and proper identification of pathologies in high resolution images since only a small fraction of the codestream is required to be transmitted and analyzed.
A modified conjugate gradient method based on the Tikhonov system for computerized tomography (CT).
Wang, Qi; Wang, Huaxiang
2011-04-01
During the past few decades, computerized tomography (CT) was widely used for non-destructive testing (NDT) and non-destructive examination (NDE) in the industrial area because of its characteristics of non-invasiveness and visibility. Recently, CT technology has been applied to multi-phase flow measurement. Using the principle of radiation attenuation measurements along different directions through the investigated object with a special reconstruction algorithm, cross-sectional information of the scanned object can be worked out. It is a typical inverse problem and has always been a challenge for its nonlinearity and ill-conditions. The Tikhonov regulation method is widely used for similar ill-posed problems. However, the conventional Tikhonov method does not provide reconstructions with qualities good enough, the relative errors between the reconstructed images and the real distribution should be further reduced. In this paper, a modified conjugate gradient (CG) method is applied to a Tikhonov system (MCGT method) for reconstructing CT images. The computational load is dominated by the number of independent measurements m, and a preconditioner is imported to lower the condition number of the Tikhonov system. Both simulation and experiment results indicate that the proposed method can reduce the computational time and improve the quality of image reconstruction. Copyright © 2010 ISA. Published by Elsevier Ltd. All rights reserved.
Parasuraman, Raja; Kidwell, Brian; Olmstead, Ryan; Lin, Ming-Kuan; Jankord, Ryan; Greenwood, Pamela
2014-06-01
We examined whether a gene known to influence dopamine availability in the prefrontal cortex is associated with individual differences in learning a supervisory control task. Methods are needed for selection and training of human operators who can effectively supervise multiple unmanned vehicles (UVs). Compared to the valine (Val) allele, the methionine (Met) allele of the COMT gene has been linked to superior executive function, but it is not known whether it is associated with training-related effects in multi-UV supervisory control performance. Ninety-nine healthy adults were genotyped for the COMT Val158Met single nucleotide polymorphism (rs4680) and divided into Met/Met, Val/Met, and Val/Val groups. Participants supervised six UVs in an air defense mission requiring them to attack incoming enemy aircraft and protect a no-fly zone from intruders in conditions of low and high task load (numbers of enemy aircraft). Training effects were examined across four blocks of trials in each task load condition. Compared to the Val/Met and Val/Val groups, Met/Met individuals exhibited a greater increase in enemy targets destroyed and greater reduction in enemy red zone incursions across training blocks. Individuals with the COMT Met/Met genotype can acquire skill in executive function tasks, such as multi-UV supervisory control, to a higher level and/or faster than other genotype groups. Potential applications of this research include the development of individualized training methods for operators of multi-UV systems and selecting personnel for complex supervisory control tasks.
Convolutional neural network for road extraction
NASA Astrophysics Data System (ADS)
Li, Junping; Ding, Yazhou; Feng, Fajie; Xiong, Baoyu; Cui, Weihong
2017-11-01
In this paper, the convolution neural network with large block input and small block output was used to extract road. To reflect the complex road characteristics in the study area, a deep convolution neural network VGG19 was conducted for road extraction. Based on the analysis of the characteristics of different sizes of input block, output block and the extraction effect, the votes of deep convolutional neural networks was used as the final road prediction. The study image was from GF-2 panchromatic and multi-spectral fusion in Yinchuan. The precision of road extraction was 91%. The experiments showed that model averaging can improve the accuracy to some extent. At the same time, this paper gave some advice about the choice of input block size and output block size.
Advanced information processing system: Local system services
NASA Technical Reports Server (NTRS)
Burkhardt, Laura; Alger, Linda; Whittredge, Roy; Stasiowski, Peter
1989-01-01
The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented.
Optimal domain decomposition strategies
NASA Technical Reports Server (NTRS)
Yoon, Yonghyun; Soni, Bharat K.
1995-01-01
The primary interest of the authors is in the area of grid generation, in particular, optimal domain decomposition about realistic configurations. A grid generation procedure with optimal blocking strategies has been developed to generate multi-block grids for a circular-to-rectangular transition duct. The focus of this study is the domain decomposition which optimizes solution algorithm/block compatibility based on geometrical complexities as well as the physical characteristics of flow field. The progress realized in this study is summarized in this paper.
NASA Astrophysics Data System (ADS)
Sun, Y. S.; Zhang, L.; Xu, B.; Zhang, Y.
2018-04-01
The accurate positioning of optical satellite image without control is the precondition for remote sensing application and small/medium scale mapping in large abroad areas or with large-scale images. In this paper, aiming at the geometric features of optical satellite image, based on a widely used optimization method of constraint problem which is called Alternating Direction Method of Multipliers (ADMM) and RFM least-squares block adjustment, we propose a GCP independent block adjustment method for the large-scale domestic high resolution optical satellite image - GISIBA (GCP-Independent Satellite Imagery Block Adjustment), which is easy to parallelize and highly efficient. In this method, the virtual "average" control points are built to solve the rank defect problem and qualitative and quantitative analysis in block adjustment without control. The test results prove that the horizontal and vertical accuracy of multi-covered and multi-temporal satellite images are better than 10 m and 6 m. Meanwhile the mosaic problem of the adjacent areas in large area DOM production can be solved if the public geographic information data is introduced as horizontal and vertical constraints in the block adjustment process. Finally, through the experiments by using GF-1 and ZY-3 satellite images over several typical test areas, the reliability, accuracy and performance of our developed procedure will be presented and studied in this paper.
Patterning nonisometric origami in nematic elastomer sheets
NASA Astrophysics Data System (ADS)
Plucinsky, Paul; Kowalski, Benjamin A.; White, Timothy J.; Bhattacharya, Kaushik
Nematic elastomers dramatically change their shape in response to diverse stimuli including light and heat. In this paper, we provide a systematic framework for the design of complex three dimensional shapes through the actuation of heterogeneously patterned nematic elastomer sheets. These sheets are composed of \\textit{nonisometric origami} building blocks which, when appropriately linked together, can actuate into a diverse array of three dimensional faceted shapes. We demonstrate both theoretically and experimentally that: 1) the nonisometric origami building blocks actuate in the predicted manner, 2) the integration of multiple building blocks leads to complex multi-stable, yet predictable, shapes, 3) we can bias the actuation experimentally to obtain a desired complex shape amongst the multi-stable shapes. We then show that this experimentally realized functionality enables a rich possible design landscape for actuation using nematic elastomers. We highlight this landscape through theoretical examples, which utilize large arrays of these building blocks to realize a desired three dimensional origami shape. In combination, these results amount to an engineering design principle, which we hope will provide a template for the application of nematic elastomers to emerging technologies.
NASA Technical Reports Server (NTRS)
Spekreijse, S. P.; Boerstoel, J. W.; Vitagliano, P. L.; Kuyvenhoven, J. L.
1992-01-01
About five years ago, a joint development was started of a flow simulation system for engine-airframe integration studies on propeller as well as jet aircraft. The initial system was based on the Euler equations and made operational for industrial aerodynamic design work. The system consists of three major components: a domain modeller, for the graphical interactive subdivision of flow domains into an unstructured collection of blocks; a grid generator, for the graphical interactive computation of structured grids in blocks; and a flow solver, for the computation of flows on multi-block grids. The industrial partners of the collaboration and NLR have demonstrated that the domain modeller, grid generator and flow solver can be applied to simulate Euler flows around complete aircraft, including propulsion system simulation. Extension to Navier-Stokes flows is in progress. Delft Hydraulics has shown that both the domain modeller and grid generator can also be applied successfully for hydrodynamic configurations. An overview is given about the main aspects of both domain modelling and grid generation.
A multi-block adaptive solving technique based on lattice Boltzmann method
NASA Astrophysics Data System (ADS)
Zhang, Yang; Xie, Jiahua; Li, Xiaoyue; Ma, Zhenghai; Zou, Jianfeng; Zheng, Yao
2018-05-01
In this paper, a CFD parallel adaptive algorithm is self-developed by combining the multi-block Lattice Boltzmann Method (LBM) with Adaptive Mesh Refinement (AMR). The mesh refinement criterion of this algorithm is based on the density, velocity and vortices of the flow field. The refined grid boundary is obtained by extending outward half a ghost cell from the coarse grid boundary, which makes the adaptive mesh more compact and the boundary treatment more convenient. Two numerical examples of the backward step flow separation and the unsteady flow around circular cylinder demonstrate the vortex structure of the cold flow field accurately and specifically.
Comparison Between Surf and Multi-Shock Forest Fire High Explosive Burn Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greenfield, Nicholas Alexander
PAGOSA1 has several different burn models used to model high explosive detonation. Two of these, Multi-Shock Forest Fire and Surf, are capable of modeling shock initiation. Accurately calculating shock initiation of a high explosive is important because it is a mechanism for detonation in many accident scenarios (i.e. fragment impact). Comparing the models to pop-plot data give confidence that the models are accurately calculating detonation or lack thereof. To compare the performance of these models, pop-plots2 were created from simulations where one two cm block of PBX 9502 collides with another block of PBX 9502.
Aerial multi-camera systems: Accuracy and block triangulation issues
NASA Astrophysics Data System (ADS)
Rupnik, Ewelina; Nex, Francesco; Toschi, Isabella; Remondino, Fabio
2015-03-01
Oblique photography has reached its maturity and has now been adopted for several applications. The number and variety of multi-camera oblique platforms available on the market is continuously growing. So far, few attempts have been made to study the influence of the additional cameras on the behaviour of the image block and comprehensive revisions to existing flight patterns are yet to be formulated. This paper looks into the precision and accuracy of 3D points triangulated from diverse multi-camera oblique platforms. Its coverage is divided into simulated and real case studies. Within the simulations, different imaging platform parameters and flight patterns are varied, reflecting both current market offerings and common flight practices. Attention is paid to the aspect of completeness in terms of dense matching algorithms and 3D city modelling - the most promising application of such systems. The experimental part demonstrates the behaviour of two oblique imaging platforms in real-world conditions. A number of Ground Control Point (GCP) configurations are adopted in order to point out the sensitivity of tested imaging networks and arising block deformations. To stress the contribution of slanted views, all scenarios are compared against a scenario in which exclusively nadir images are used for evaluation.
Mo, Xuejun; Li, Qiushi; Yi Lui, Lena Wai; Zheng, Baixue; Kang, Chiang Huen; Nugraha, Bramasta; Yue, Zhilian; Jia, Rui Rui; Fu, Hong Xia; Choudhury, Deepak; Arooz, Talha; Yan, Jie; Lim, Chwee Teck; Shen, Shali; Hong Tan, Choon; Yu, Hanry
2010-10-01
Tissue constructs that mimic the in vivo cell-cell and cell-matrix interactions are especially useful for applications involving the cell- dense and matrix- poor internal organs. Rapid and precise arrangement of cells into functional tissue constructs remains a challenge in tissue engineering. We demonstrate rapid assembly of C3A cells into multi- cell structures using a dendrimeric intercellular linker. The linker is composed of oleyl- polyethylene glycol (PEG) derivatives conjugated to a 16 arms- polypropylenimine hexadecaamine (DAB) dendrimer. The positively charged multivalent dendrimer concentrates the linker onto the negatively charged cell surface to facilitate efficient insertion of the hydrophobic oleyl groups into the cellular membrane. Bringing linker- treated cells into close proximity to each other via mechanical means such as centrifugation and micromanipulation enables their rapid assembly into multi- cellular structures within minutes. The cells exhibit high levels of viability, proliferation, three- dimensional (3D) cell morphology and other functions in the constructs. We constructed defined multi- cellular structures such as rings, sheets or branching rods that can serve as potential tissue building blocks to be further assembled into complex 3D tissue constructs for biomedical applications. 2010 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Turcksin, Bruno; Ragusa, Jean C.; Morel, Jim E.
2012-01-01
It is well known that the diffusion synthetic acceleration (DSA) methods for the Sn equations become ineffective in the Fokker-Planck forward-peaked scattering limit. In response to this deficiency, Morel and Manteuffel (1991) developed an angular multigrid method for the 1-D Sn equations. This method is very effective, costing roughly twice as much as DSA per source iteration, and yielding a maximum spectral radius of approximately 0.6 in the Fokker-Planck limit. Pautz, Adams, and Morel (PAM) (1999) later generalized the angular multigrid to 2-D, but it was found that the method was unstable with sufficiently forward-peaked mappings between the angular grids. The method was stabilized via a filtering technique based on diffusion operators, but this filtering also degraded the effectiveness of the overall scheme. The spectral radius was not bounded away from unity in the Fokker-Planck limit, although the method remained more effective than DSA. The purpose of this article is to recast the multidimensional PAM angular multigrid method without the filtering as an Sn preconditioner and use it in conjunction with the Generalized Minimal RESidual (GMRES) Krylov method. The approach ensures stability and our computational results demonstrate that it is also significantly more efficient than an analogous DSA-preconditioned Krylov method.
NASA Astrophysics Data System (ADS)
Aubry, R.; Oñate, E.; Idelsohn, S. R.
2006-09-01
The method presented in Aubry et al. (Comput Struc 83:1459-1475, 2005) for the solution of an incompressible viscous fluid flow with heat transfer using a fully Lagrangian description of motion is extended to three dimensions (3D) with particular emphasis on mass conservation. A modified fractional step (FS) based on the pressure Schur complement (Turek 1999), and related to the class of algebraic splittings Quarteroni et al. (Comput Methods Appl Mech Eng 188:505-526, 2000), is used and a new advantage of the splittings of the equations compared with the classical FS is highlighted for free surface problems. The temperature is semi-coupled with the displacement, which is the main variable in a Lagrangian description. Comparisons for various mesh Reynolds numbers are performed with the classical FS, an algebraic splitting and a monolithic solution, in order to illustrate the behaviour of the Uzawa operator and the mass conservation. As the classical fractional step is equivalent to one iteration of the Uzawa algorithm performed with a standard Laplacian as a preconditioner, it will behave well only in a Reynold mesh number domain where the preconditioner is efficient. Numerical results are provided to assess the superiority of the modified algebraic splitting to the classical FS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadid, John Nicolas; Lin, Paul Tinphone
2009-01-01
This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a capacity cluster with 272 compute nodes based on a homogeneous multicore node architecture utilizing 16 cores. The inter-node communication backbone for this Tri-Lab Linux Capacity Cluster (TLCC) machine is comprised of an InfiniBand interconnect. The nonuniform memory access (NUMA) nodes consist of 2.2 GHz quad socket/quad core AMD Opteron processors. The performance results for this study are obtained with a FE semiconductor device simulation code (Charon) that is based on a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling andmore » multicore performance results are presented for large-scale problems of 100+ million unknowns on up to 4096 cores. A parallel scaling comparison is also presented with the Cray XT3/4 Red Storm capability platform. The results indicate that an MPI-only programming model for utilizing the multicore nodes is reasonably efficient on all 16 cores per compute node. However, the results also indicated that the multilevel preconditioner, which is critical for large-scale capability type simulations, scales better on the Red Storm machine than the TLCC machine.« less
Svyatsky, Daniil; Lipnikov, Konstantin
2017-03-18
Richards’s equation describes steady-state or transient flow in a variably saturated medium. For a medium having multiple layers of soils that are not aligned with coordinate axes, a mesh fitted to these layers is no longer orthogonal and the classical two-point flux approximation finite volume scheme is no longer accurate. Here, we propose new second-order accurate nonlinear finite volume (NFV) schemes for the head and pressure formulations of Richards’ equation. We prove that the discrete maximum principles hold for both formulations at steady-state which mimics similar properties of the continuum solution. The second-order accuracy is achieved using high-order upwind algorithmsmore » for the relative permeability. Numerical simulations of water infiltration into a dry soil show significant advantage of the second-order NFV schemes over the first-order NFV schemes even on coarse meshes. Since explicit calculation of the Jacobian matrix becomes prohibitively expensive for high-order schemes due to build-in reconstruction and slope limiting algorithms, we study numerically the preconditioning strategy introduced recently in Lipnikov et al. (2016) that uses a stable approximation of the continuum Jacobian. Lastly, numerical simulations show that the new preconditioner reduces computational cost up to 2–3 times in comparison with the conventional preconditioners.« less
Convergence Acceleration of Runge-Kutta Schemes for Solving the Navier-Stokes Equations
NASA Technical Reports Server (NTRS)
Swanson, Roy C., Jr.; Turkel, Eli; Rossow, C.-C.
2007-01-01
The convergence of a Runge-Kutta (RK) scheme with multigrid is accelerated by preconditioning with a fully implicit operator. With the extended stability of the Runge-Kutta scheme, CFL numbers as high as 1000 can be used. The implicit preconditioner addresses the stiffness in the discrete equations associated with stretched meshes. This RK/implicit scheme is used as a smoother for multigrid. Fourier analysis is applied to determine damping properties. Numerical dissipation operators based on the Roe scheme, a matrix dissipation, and the CUSP scheme are considered in evaluating the RK/implicit scheme. In addition, the effect of the number of RK stages is examined. Both the numerical and computational efficiency of the scheme with the different dissipation operators are discussed. The RK/implicit scheme is used to solve the two-dimensional (2-D) and three-dimensional (3-D) compressible, Reynolds-averaged Navier-Stokes equations. Turbulent flows over an airfoil and wing at subsonic and transonic conditions are computed. The effects of the cell aspect ratio on convergence are investigated for Reynolds numbers between 5:7 x 10(exp 6) and 100 x 10(exp 6). It is demonstrated that the implicit preconditioner can reduce the computational time of a well-tuned standard RK scheme by a factor between four and ten.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
DOE Office of Scientific and Technical Information (OSTI.GOV)
Svyatsky, Daniil; Lipnikov, Konstantin
Richards’s equation describes steady-state or transient flow in a variably saturated medium. For a medium having multiple layers of soils that are not aligned with coordinate axes, a mesh fitted to these layers is no longer orthogonal and the classical two-point flux approximation finite volume scheme is no longer accurate. Here, we propose new second-order accurate nonlinear finite volume (NFV) schemes for the head and pressure formulations of Richards’ equation. We prove that the discrete maximum principles hold for both formulations at steady-state which mimics similar properties of the continuum solution. The second-order accuracy is achieved using high-order upwind algorithmsmore » for the relative permeability. Numerical simulations of water infiltration into a dry soil show significant advantage of the second-order NFV schemes over the first-order NFV schemes even on coarse meshes. Since explicit calculation of the Jacobian matrix becomes prohibitively expensive for high-order schemes due to build-in reconstruction and slope limiting algorithms, we study numerically the preconditioning strategy introduced recently in Lipnikov et al. (2016) that uses a stable approximation of the continuum Jacobian. Lastly, numerical simulations show that the new preconditioner reduces computational cost up to 2–3 times in comparison with the conventional preconditioners.« less
Efficient Coupling of Fluid-Plasma and Monte-Carlo-Neutrals Models for Edge Plasma Transport
NASA Astrophysics Data System (ADS)
Dimits, A. M.; Cohen, B. I.; Friedman, A.; Joseph, I.; Lodestro, L. L.; Rensink, M. E.; Rognlien, T. D.; Sjogreen, B.; Stotler, D. P.; Umansky, M. V.
2017-10-01
UEDGE has been valuable for modeling transport in the tokamak edge and scrape-off layer due in part to its efficient fully implicit solution of coupled fluid neutrals and plasma models. We are developing an implicit coupling of the kinetic Monte-Carlo (MC) code DEGAS-2, as the neutrals model component, to the UEDGE plasma component, based on an extension of the Jacobian-free Newton-Krylov (JFNK) method to MC residuals. The coupling components build on the methods and coding already present in UEDGE. For the linear Krylov iterations, a procedure has been developed to ``extract'' a good preconditioner from that of UEDGE. This preconditioner may also be used to greatly accelerate the convergence rate of a relaxed fixed-point iteration, which may provide a useful ``intermediate'' algorithm. The JFNK method also requires calculation of Jacobian-vector products, for which any finite-difference procedure is inaccurate when a MC component is present. A semi-analytical procedure that retains the standard MC accuracy and fully kinetic neutrals physics is therefore being developed. Prepared for US DOE by LLNL under Contract DE-AC52-07NA27344 and LDRD project 15-ERD-059, by PPPL under Contract DE-AC02-09CH11466, and supported in part by the U.S. DOE, OFES.
Multi-level comparison of empathy in schizophrenia: an fMRI study of a cartoon task.
Lee, Seung Jae; Kang, Do Hyung; Kim, Chi-Won; Gu, Bon Mi; Park, Ji-Young; Choi, Chi-Hoon; Shin, Na Young; Lee, Jong-Min; Kwon, Jun Soo
2010-02-28
Empathy deficits might play a role in social dysfunction in schizophrenia. However, few studies have investigated the neuroanatomical underpinnings of the subcomponents of empathy in schizophrenia. This study investigated the hemodynamic responses to three subcomponents of empathy in patients with schizophrenia (N=15) and healthy volunteers (N=18), performing an empathy cartoon task during functional magnetic resonance imaging. The experiment used a block design with four conditions: cognitive, emotional, and inhibitory empathy, and physical causality control. Data were analyzed by comparing the blood-oxygen-level-dependent (BOLD) signal activation between the two groups. The cognitive empathy condition activated the right temporal pole to a lesser extent in the patient group than in comparison subjects. In the emotional and inhibitory conditions, the patients showed greater activation in the left insula and in the right middle/inferior frontal cortex, respectively. These findings add to our understanding of the impaired empathy in patients with schizophrenia by identifying a multi-level cortical dysfunction that underlies a deficit in each subcomponent of empathy and highlighting the importance of the fronto-temporal cortical network in ability to empathize. Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.
Wientjes, Yvonne C J; Bijma, Piter; Vandenplas, Jérémie; Calus, Mario P L
2017-10-01
Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. Copyright © 2017 by the Genetics Society of America.
Huang, Chao-Tsung; Wang, Yu-Wen; Huang, Li-Ren; Chin, Jui; Chen, Liang-Gee
2017-02-01
Digital refocusing has a tradeoff between complexity and quality when using sparsely sampled light fields for low-storage applications. In this paper, we propose a fast physically correct refocusing algorithm to address this issue in a twofold way. First, view interpolation is adopted to provide photorealistic quality at infocus-defocus hybrid boundaries. Regarding its conventional high complexity, we devised a fast line-scan method specifically for refocusing, and its 1D kernel can be 30× faster than the benchmark View Synthesis Reference Software (VSRS)-1D-Fast. Second, we propose a block-based multi-rate processing flow for accelerating purely infocused or defocused regions, and a further 3- 34× speedup can be achieved for high-resolution images. All candidate blocks of variable sizes can interpolate different numbers of rendered views and perform refocusing in different subsampled layers. To avoid visible aliasing and block artifacts, we determine these parameters and the simulated aperture filter through a localized filter response analysis using defocus blur statistics. The final quadtree block partitions are then optimized in terms of computation time. Extensive experimental results are provided to show superior refocusing quality and fast computation speed. In particular, the run time is comparable with the conventional single-image blurring, which causes serious boundary artifacts.
Super-pixel extraction based on multi-channel pulse coupled neural network
NASA Astrophysics Data System (ADS)
Xu, GuangZhu; Hu, Song; Zhang, Liu; Zhao, JingJing; Fu, YunXia; Lei, BangJun
2018-04-01
Super-pixel extraction techniques group pixels to form over-segmented image blocks according to the similarity among pixels. Compared with the traditional pixel-based methods, the image descripting method based on super-pixel has advantages of less calculation, being easy to perceive, and has been widely used in image processing and computer vision applications. Pulse coupled neural network (PCNN) is a biologically inspired model, which stems from the phenomenon of synchronous pulse release in the visual cortex of cats. Each PCNN neuron can correspond to a pixel of an input image, and the dynamic firing pattern of each neuron contains both the pixel feature information and its context spatial structural information. In this paper, a new color super-pixel extraction algorithm based on multi-channel pulse coupled neural network (MPCNN) was proposed. The algorithm adopted the block dividing idea of SLIC algorithm, and the image was divided into blocks with same size first. Then, for each image block, the adjacent pixels of each seed with similar color were classified as a group, named a super-pixel. At last, post-processing was adopted for those pixels or pixel blocks which had not been grouped. Experiments show that the proposed method can adjust the number of superpixel and segmentation precision by setting parameters, and has good potential for super-pixel extraction.
Robustness, Death of Spiral Wave in the Network of Neurons under Partial Ion Channel Block
NASA Astrophysics Data System (ADS)
Ma, Jun; Huang, Long; Wang, Chun-Ni; Pu, Zhong-Sheng
2013-02-01
The development of spiral wave in a two-dimensional square array due to partial ion channel block (Potassium, Sodium) is investigated, the dynamics of the node is described by Hodgkin—Huxley neuron and these neurons are coupled with nearest neighbor connection. The parameter ratio xNa (and xK), which defines the ratio of working ion channel number of sodium (potassium) to the total ion channel number of sodium (and potassium), is used to measure the shift conductance induced by channel block. The distribution of statistical variable R in the two-parameter phase space (parameter ratio vs. poisoning area) is extensively calculated to mark the parameter region for transition of spiral wave induced by partial ion channel block, the area with smaller factors of synchronization R is associated the parameter region that spiral wave keeps alive and robust to the channel poisoning. Spiral wave keeps alive when the poisoned area (potassium or sodium) and degree of intoxication are small, distinct transition (death, several spiral waves coexist or multi-arm spiral wave emergence) occurs under moderate ratio xNa (and xK) when the size of blocked area exceeds certain thresholds. Breakup of spiral wave occurs and multi-arm of spiral waves are observed when the channel noise is considered.
Kharche, Sanjay R.; So, Aaron; Salerno, Fabio; Lee, Ting-Yim; Ellis, Chris; Goldman, Daniel; McIntyre, Christopher W.
2018-01-01
Dialysis prolongs life but augments cardiovascular mortality. Imaging data suggests that dialysis increases myocardial blood flow (BF) heterogeneity, but its causes remain poorly understood. A biophysical model of human coronary vasculature was used to explain the imaging observations, and highlight causes of coronary BF heterogeneity. Post-dialysis CT images from patients under control, pharmacological stress (adenosine), therapy (cooled dialysate), and adenosine and cooled dialysate conditions were obtained. The data presented disparate phenotypes. To dissect vascular mechanisms, a 3D human vasculature model based on known experimental coronary morphometry and a space filling algorithm was implemented. Steady state simulations were performed to investigate the effects of altered aortic pressure and blood vessel diameters on myocardial BF heterogeneity. Imaging showed that stress and therapy potentially increased mean and total BF, while reducing heterogeneity. BF histograms of one patient showed multi-modality. Using the model, it was found that total coronary BF increased as coronary perfusion pressure was increased. BF heterogeneity was differentially affected by large or small vessel blocking. BF heterogeneity was found to be inversely related to small blood vessel diameters. Simulation of large artery stenosis indicates that BF became heterogeneous (increase relative dispersion) and gave multi-modal histograms. The total transmural BF as well as transmural BF heterogeneity reduced due to large artery stenosis, generating large patches of very low BF regions downstream. Blocking of arteries at various orders showed that blocking larger arteries results in multi-modal BF histograms and large patches of low BF, whereas smaller artery blocking results in augmented relative dispersion and fractal dimension. Transmural heterogeneity was also affected. Finally, the effects of augmented aortic pressure in the presence of blood vessel blocking shows differential effects on BF heterogeneity as well as transmural BF. Improved aortic blood pressure may improve total BF. Stress and therapy may be effective if they dilate small vessels. A potential cause for the observed complex BF distributions (multi-modal BF histograms) may indicate existing large vessel stenosis. The intuitive BF heterogeneity methods used can be readily used in clinical studies. Further development of the model and methods will permit personalized assessment of patient BF status. PMID:29867555
Links between North Atlantic atmospheric blocking and recent trends in European winter precipitation
NASA Astrophysics Data System (ADS)
Ummenhofer, Caroline; Seo, Hyodae; Kwon, Young-Oh; Joyce, Terrence
2015-04-01
European precipitation has sustained robust trends during wintertime (January - March) over recent decades. Central, western, and northern Europe have become wetter by an average 0.1-0.3% per annum for the period 1901-2010, while southern Europe, including the Iberian Peninsula, much of Italy and the Balkan States, has sustained drying of -0.2% per annum or more over the same period. The overall pattern is consistent across different observational precipitation products, while the magnitude of the precipitation trends varies amongst data sets. Using cluster analysis, which identifies recurrent states (or regimes) of European winter precipitation by grouping them according to an objective similarity criterion, changes in the frequency of dominant winter precipitation patterns over the past century are evaluated. Considerable multi-decadal variability exists in the frequency of dominant winter precipitation patterns: more recent decades are characterised by significantly fewer winters with anomalous wet conditions over southern, western, and central Europe. In contrast, winters with dry conditions in western and southern Europe, but above-average rainfall in western Scandinavia and the northern British Isles, have been more common recently. We evaluate the associated multi-decadal large-scale circulation changes across the broader extratropical North Atlantic region, which accompany the observed wintertime precipitation variability using the 20th Century reanalysis product. Some influence of the North Atlantic Oscillation (NAO) is apparent in modulating the frequency of dominant precipitation patterns. However, recent trends in the characteristics of atmospheric blocking across the North Atlantic sector indicate a change in the dominant blocking centres (near Greenland, the British Isles, and west of the Iberian Peninsula). Associated changes in sea level pressure, storm track position and strength, and oceanic heat fluxes across the North Atlantic region are also addressed.
FERMI: a digital Front End and Readout MIcrosystem for high resolution calorimetry
NASA Astrophysics Data System (ADS)
Alexanian, H.; Appelquist, G.; Bailly, P.; Benetta, R.; Berglund, S.; Bezamat, J.; Blouzon, F.; Bohm, C.; Breveglieri, L.; Brigati, S.; Cattaneo, P. W.; Dadda, L.; David, J.; Engström, M.; Genat, J. F.; Givoletti, M.; Goggi, V. G.; Gong, S.; Grieco, G. M.; Hansen, M.; Hentzell, H.; Holmberg, T.; Höglund, I.; Inkinen, S. J.; Kerek, A.; Landi, C.; Ledortz, O.; Lippi, M.; Lofstedt, B.; Lund-Jensen, B.; Maloberti, F.; Mutz, S.; Nayman, P.; Piuri, V.; Polesello, G.; Sami, M.; Savoy-Navarro, A.; Schwemling, P.; Stefanelli, R.; Sundblad, R.; Svensson, C.; Torelli, G.; Vanuxem, J. P.; Yamdagni, N.; Yuan, J.; Ödmark, A.; Fermi Collaboration
1995-02-01
We present a digital solution for the front-end electronics of high resolution calorimeters at future colliders. It is based on analogue signal compression, high speed {A}/{D} converters, a fully programmable pipeline and a digital signal processing (DSP) chain with local intelligence and system supervision. This digital solution is aimed at providing maximal front-end processing power by performing waveform analysis using DSP methods. For the system integration of the multichannel device a multi-chip, silicon-on-silicon multi-chip module (MCM) has been adopted. This solution allows a high level of integration of complex analogue and digital functions, with excellent flexibility in mixing technologies for the different functional blocks. This type of multichip integration provides a high degree of reliability and programmability at both the function and the system level, with the additional possibility of customising the microsystem to detector-specific requirements. For enhanced reliability in high radiation environments, fault tolerance strategies, i.e. redundancy, reconfigurability, majority voting and coding for error detection and correction, are integrated into the design.
Multi-equilibrium property of metabolic networks: SSI module.
Lei, Hong-Bo; Zhang, Ji-Feng; Chen, Luonan
2011-06-20
Revealing the multi-equilibrium property of a metabolic network is a fundamental and important topic in systems biology. Due to the complexity of the metabolic network, it is generally a difficult task to study the problem as a whole from both analytical and numerical viewpoint. On the other hand, the structure-oriented modularization idea is a good choice to overcome such a difficulty, i.e. decomposing the network into several basic building blocks and then studying the whole network through investigating the dynamical characteristics of the basic building blocks and their interactions. Single substrate and single product with inhibition (SSI) metabolic module is one type of the basic building blocks of metabolic networks, and its multi-equilibrium property has important influence on that of the whole metabolic networks. In this paper, we describe what the SSI metabolic module is, characterize the rates of the metabolic reactions by Hill kinetics and give a unified model for SSI modules by using a set of nonlinear ordinary differential equations with multi-variables. Specifically, a sufficient and necessary condition is first given to describe the injectivity of a class of nonlinear systems, and then, the sufficient condition is used to study the multi-equilibrium property of SSI modules. As a main theoretical result, for the SSI modules in which each reaction has no more than one inhibitor, a sufficient condition is derived to rule out multiple equilibria, i.e. the Jacobian matrix of its rate function is nonsingular everywhere. In summary, we describe SSI modules and give a general modeling framework based on Hill kinetics, and provide a sufficient condition for ruling out multiple equilibria of a key type of SSI module.
Multi-equilibrium property of metabolic networks: SSI module
2011-01-01
Background Revealing the multi-equilibrium property of a metabolic network is a fundamental and important topic in systems biology. Due to the complexity of the metabolic network, it is generally a difficult task to study the problem as a whole from both analytical and numerical viewpoint. On the other hand, the structure-oriented modularization idea is a good choice to overcome such a difficulty, i.e. decomposing the network into several basic building blocks and then studying the whole network through investigating the dynamical characteristics of the basic building blocks and their interactions. Single substrate and single product with inhibition (SSI) metabolic module is one type of the basic building blocks of metabolic networks, and its multi-equilibrium property has important influence on that of the whole metabolic networks. Results In this paper, we describe what the SSI metabolic module is, characterize the rates of the metabolic reactions by Hill kinetics and give a unified model for SSI modules by using a set of nonlinear ordinary differential equations with multi-variables. Specifically, a sufficient and necessary condition is first given to describe the injectivity of a class of nonlinear systems, and then, the sufficient condition is used to study the multi-equilibrium property of SSI modules. As a main theoretical result, for the SSI modules in which each reaction has no more than one inhibitor, a sufficient condition is derived to rule out multiple equilibria, i.e. the Jacobian matrix of its rate function is nonsingular everywhere. Conclusions In summary, we describe SSI modules and give a general modeling framework based on Hill kinetics, and provide a sufficient condition for ruling out multiple equilibria of a key type of SSI module. PMID:21689474
Multi-boson block factorization of fermions
NASA Astrophysics Data System (ADS)
Giusti, Leonardo; Cè, Marco; Schaefer, Stefan
2018-03-01
The numerical computations of many quantities of theoretical and phenomenological interest are plagued by statistical errors which increase exponentially with the distance of the sources in the relevant correlators. Notable examples are baryon masses and matrix elements, the hadronic vacuum polarization and the light-by-light scattering contributions to the muon g - 2, and the form factors of semileptonic B decays. Reliable and precise determinations of these quantities are very difficult if not impractical with state-of-the-art standard Monte Carlo integration schemes. I will review a recent proposal for factorizing the fermion determinant in lattice QCD that leads to a local action in the gauge field and in the auxiliary boson fields. Once combined with the corresponding factorization of the quark propagator, it paves the way for multi-level Monte Carlo integration in the presence of fermions opening new perspectives in lattice QCD. Exploratory results on the impact on the above mentioned observables will be presented.
AZTEC: A parallel iterative package for the solving linear systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.
1996-12-31
We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
A Study of Synchronization Techniques for Optical Communication Systems
NASA Technical Reports Server (NTRS)
Gagliardi, R. M.
1975-01-01
The study of synchronization techniques and related topics in the design of high data rate, deep space, optical communication systems was reported. Data cover: (1) effects of timing errors in narrow pulsed digital optical systems, (2) accuracy of microwave timing systems operating in low powered optical systems, (3) development of improved tracking systems for the optical channel and determination of their tracking performance, (4) development of usable photodetector mathematical models for application to analysis and performance design in communication receivers, and (5) study application of multi-level block encoding to optical transmission of digital data.
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
NASA National Combustion Code Simulations
NASA Technical Reports Server (NTRS)
Iannetti, Anthony; Davoudzadeh, Farhad
2001-01-01
A systematic effort is in progress to further validate the National Combustion Code (NCC) that has been developed at NASA Glenn Research Center (GRC) for comprehensive modeling and simulation of aerospace combustion systems. The validation efforts include numerical simulation of the gas-phase combustor experiments conducted at the Center for Turbulence Research (CTR), Stanford University, followed by comparison and evaluation of the computed results with the experimental data. Presently, at GRC, a numerical model of the experimental gaseous combustor is built to simulate the experimental model. The constructed numerical geometry includes the flow development sections for air annulus and fuel pipe, 24 channel air and fuel swirlers, hub, combustor, and tail pipe. Furthermore, a three-dimensional multi-block, multi-grid grid (1.6 million grid points, 3-levels of multi-grid) is generated. Computational simulation of the gaseous combustor flow field operating on methane fuel has started. The computational domain includes the whole flow regime starting from the fuel pipe and the air annulus, through the 12 air and 12 fuel channels, in the combustion region and through the tail pipe.
Dependency graph for code analysis on emerging architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shashkov, Mikhail Jurievich; Lipnikov, Konstantin
Direct acyclic dependency (DAG) graph is becoming the standard for modern multi-physics codes.The ideal DAG is the true block-scheme of a multi-physics code. Therefore, it is the convenient object for insitu analysis of the cost of computations and algorithmic bottlenecks related to statistical frequent data motion and dymanical machine state.
Divide and control: split design of multi-input DNA logic gates.
Gerasimova, Yulia V; Kolpashchikov, Dmitry M
2015-01-18
Logic gates made of DNA have received significant attention as biocompatible building blocks for molecular circuits. The majority of DNA logic gates, however, are controlled by the minimum number of inputs: one, two or three. Here we report a strategy to design a multi-input logic gate by splitting a DNA construct.
Carpet: Adaptive Mesh Refinement for the Cactus Framework
NASA Astrophysics Data System (ADS)
Schnetter, Erik; Hawley, Scott; Hawke, Ian
2016-11-01
Carpet is an adaptive mesh refinement and multi-patch driver for the Cactus Framework (ascl:1102.013). Cactus is a software framework for solving time-dependent partial differential equations on block-structured grids, and Carpet acts as driver layer providing adaptive mesh refinement, multi-patch capability, as well as parallelization and efficient I/O.
The two-loop symbol of all multi-Regge regions
Bargheer, Till; Papathanasiou, Georgios; Schomerus, Volker
2016-05-02
Here, we study the symbol of the two-loop n-gluon MHV amplitude for all Mandelstam regions in multi-Regge kinematics in N=4 super Yang-Mills theory. While the number of distinct Mandelstam regions grows exponentially with n, the increase of independent symbols turns out to be merely quadratic. We uncover how to construct the symbols for any number of external gluons from just two building blocks which are naturally associated with the six- and seven-gluon amplitude, respectively. The second building block is entirely new, and in addition to its symbol, we also construct a prototype function that correctly reproduces all terms of maximalmore » functional transcendentality.« less
Ab-Ghani, Zuryati; Jaafar, Wahyuni; Foo, Siew Fon; Ariffin, Zaihan; Mohamad, Dasmawati
2015-01-01
To evaluate the shear bond strength between the dentin substrate and computer-aided design and computer-aided manufacturing feldspathic ceramic and nano resin ceramics blocks cemented with resin cement. Sixty cuboidal blocks (5 mm × 5 mm × 5 mm) were fabricated in equal numbers from feldspathic ceramic CEREC(®) Blocs PC and nano resin ceramic Lava™ Ultimate, and randomly divided into six groups (n = 10). Each block was cemented to the dentin of 60 extracted human premolar using Variolink(®) II/Syntac Classic (multi-steps etch-and-rinse adhesive bonding), NX3 Nexus(®) (two-steps etch-and-rinse adhesive bonding) and RelyX™ U200 self-adhesive cement. All specimens were thermocycled, and shear bond strength testing was done using the universal testing machine at a crosshead speed of 1.0 mm/min. Data were analyzed using one-way ANOVA. Combination of CEREC(®) Blocs PC and Variolink(®) II showed the highest mean shear bond strength (8.71 Mpa), while the lowest of 2.06 Mpa were observed in Lava™ Ultimate and RelyX™ U200. There was no significant difference in the mean shear bond strength between different blocks. Variolink(®) II cement using multi-steps etch-and-rinse adhesive bonding provided a higher shear bond strength than the self-adhesive cement RelyX U200. The shear bond strength was not affected by the type of blocks used.
Irregular water supply, household usage and dengue: a bio-social study in the Brazilian Northeast.
Caprara, Andrea; Lima, José Wellington de Oliveira; Marinho, Alice Correia Pequeno; Calvasina, Paola Gondim; Landim, Lucyla Paes; Sommerfeld, Johannes
2009-01-01
Despite increased vector control efforts, dengue fever remains endemic in Fortaleza, Northeast Brazil, where sporadic epidemic outbreaks have occurred since 1986. Multiple factors affect vector ecology such as social policy, migration, urbanization, city water supply, garbage disposal and housing conditions, as well as community level understanding of the disease and related practices. This descriptive study used a multi-disciplinary approach that bridged anthropology and entomology. A multiple case study design was adopted to include research in six study areas, defined as blocks. The water supply is irregular in households from both under-privileged and privileged areas, however, clear differences exist. In the more privileged blocks, several homes are not connected to the public water system, but have a well and pump system and therefore irregularity of supply does not affect them. In households from under-privileged blocks, where the water supply is irregular, the frequent use of water containers such as water tanks, cisterns, barrels and pots, creates environmental conditions with a greater number of breeding areas. In under-privileged homes, there are more possible breeding areas and environmental conditions that may improve the chances of Aedes aegypti survival.
A study on community-based approaches to reduce leprosy stigma in India.
Raju, M S; Rao, P S S; Mutatkar, R K
2008-01-01
There is a global awareness that reduction of leprosy stigma is not at par with the technological developments and the resulting cognitive changes pertaining to leprosy, which can be attributed to lack of active community participation in the programmes. With a major aim of identifying the best methods using active participation of the society, the Leprosy Mission in India initiated a multi-state community-based interventional trial of leprosy stigma reduction in 2 similar rural blocks located beyond 25 km. from the three hospitals, from 3 states, at Faizabad in Uttar Pradesh, Purulia in West Bengal and Champa in Chhattisgarh of India during 2005. A baseline survey was done which confirmed a high level of leprosy stigma. A stigma reduction organizing committee (SROC) in each village, thus a total of 60 SROCs from 3 states @ 10 from each block were formed. One trained social worker appointed by the project as community organizer in each block acted as a facilitator for all the stigma reduction activities taken up by the committees. The outcome of the project shows, the SROCs' interventions are well accepted by the communities. Education and counseling through SROC members in local circumstances are very much feasible and effective.
Collective operations in a file system based execution model
Shinde, Pravin; Van Hensbergen, Eric
2013-02-12
A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.
Collective operations in a file system based execution model
Shinde, Pravin; Van Hensbergen, Eric
2013-02-19
A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.
The need to look at antibiotic resistance from a health systems perspective
Vlad, Ioana
2014-01-01
Current use, misuse, and overuse of antibiotics raise dangers and ethical dilemmas that cannot be solved in isolation, exclusively within a health system building block or even within the health sector only. There is a need to tackle antibiotic resistance emergence and containment on levels ranging from individuals, households, and the communities, to health care facilities, the entire health sector, and finally to national and global levels. We analyse emergence of antibiotic resistance based on interdependencies between health systems resources. We further go beyond the health system building blocks, to look at determinants of antibiotic resistance referring to wider global dynamics. Multi-level governance is the key for successful action in containment strategies. This will involve, in a comprehensive way, patients, health facilities where they receive care, health systems to which these facilities pertain, and the wider national context as well as the global community that influences the functioning of these health systems. In order to be effective and sustainable in both high and low-resource settings, implementation of containment interventions at all these levels needs to be managed based on existing theories and models of change. Although ministries of health and the global community must provide vision and support, it is important to keep in mind that containment interventions for antibiotic resistance will target individuals, consumers as well as providers. PMID:24673267
NASA Astrophysics Data System (ADS)
Averkin, Sergey N.; Gatsonis, Nikolaos A.
2018-06-01
An unstructured electrostatic Particle-In-Cell (EUPIC) method is developed on arbitrary tetrahedral grids for simulation of plasmas bounded by arbitrary geometries. The electric potential in EUPIC is obtained on cell vertices from a finite volume Multi-Point Flux Approximation of Gauss' law using the indirect dual cell with Dirichlet, Neumann and external circuit boundary conditions. The resulting matrix equation for the nodal potential is solved with a restarted generalized minimal residual method (GMRES) and an ILU(0) preconditioner algorithm, parallelized using a combination of node coloring and level scheduling approaches. The electric field on vertices is obtained using the gradient theorem applied to the indirect dual cell. The algorithms for injection, particle loading, particle motion, and particle tracking are parallelized for unstructured tetrahedral grids. The algorithms for the potential solver, electric field evaluation, loading, scatter-gather algorithms are verified using analytic solutions for test cases subject to Laplace and Poisson equations. Grid sensitivity analysis examines the L2 and L∞ norms of the relative error in potential, field, and charge density as a function of edge-averaged and volume-averaged cell size. Analysis shows second order of convergence for the potential and first order of convergence for the electric field and charge density. Temporal sensitivity analysis is performed and the momentum and energy conservation properties of the particle integrators in EUPIC are examined. The effects of cell size and timestep on heating, slowing-down and the deflection times are quantified. The heating, slowing-down and the deflection times are found to be almost linearly dependent on number of particles per cell. EUPIC simulations of current collection by cylindrical Langmuir probes in collisionless plasmas show good comparison with previous experimentally validated numerical results. These simulations were also used in a parallelization efficiency investigation. Results show that the EUPIC has efficiency of more than 80% when the simulation is performed on a single CPU from a non-uniform memory access node and the efficiency is decreasing as the number of threads further increases. The EUPIC is applied to the simulation of the multi-species plasma flow over a geometrically complex CubeSat in Low Earth Orbit. The EUPIC potential and flowfield distribution around the CubeSat exhibit features that are consistent with previous simulations over simpler geometrical bodies.
The Acquisition and Retention of Visual Aircraft Recognition Skills
1976-11-01
instructed with a printcd version of the GOAR imagery. Students were given multi-view cards and flashcards of each aircraft. The *multi-view cards had the...on the other. Each flashcard presetsted one aspect of an aircraft on the front with its nomenclature on the back. The training system designed for...included the five-image, multi-view cards and single-ima.- Flashcards . These materials were produced for 80 aircraft, which were grouped into 4 blocks
Multistage Computerized Adaptive Testing with Uniform Item Exposure
ERIC Educational Resources Information Center
Edwards, Michael C.; Flora, David B.; Thissen, David
2012-01-01
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Electro-Optic Computing Architectures. Volume I
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
Multi-blocking strategies for the INS3D incompressible Navier-Stokes code
NASA Technical Reports Server (NTRS)
Gatlin, Boyd
1990-01-01
With the continuing development of bigger and faster supercomputers, computational fluid dynamics (CFD) has become a useful tool for real-world engineering design and analysis. However, the number of grid points necessary to resolve realistic flow fields numerically can easily exceed the memory capacity of available computers. In addition, geometric shapes of flow fields, such as those in the Space Shuttle Main Engine (SSME) power head, may be impossible to fill with continuous grids upon which to obtain numerical solutions to the equations of fluid motion. The solution to this dilemma is simply to decompose the computational domain into subblocks of manageable size. Computer codes that are single-block by construction can be modified to handle multiple blocks, but ad-hoc changes in the FORTRAN have to be made for each geometry treated. For engineering design and analysis, what is needed is generalization so that the blocking arrangement can be specified by the user. INS3D is a computer program for the solution of steady, incompressible flow problems. It is used frequently to solve engineering problems in the CFD Branch at Marshall Space Flight Center. INS3D uses an implicit solution algorithm and the concept of artificial compressibility to provide the necessary coupling between the pressure field and the velocity field. The development of generalized multi-block capability in INS3D is described.
NASA Astrophysics Data System (ADS)
Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua
2014-12-01
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn; Deng, Xiaogang; Zhang, Lilun
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore » high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.« less
Efficient Preconditioning for the p-Version Finite Element Method in Two Dimensions
1989-10-01
paper, we study fast parallel preconditioners for systems of equations arising from the p-version finite element method. The p-version finite element...computations and the solution of a relatively small global auxiliary problem. We study two different methods. In the first (Section 3), the global...20], will be studied in the next section. Problem (3.12) is obviously much more easily solved than the original problem ,nd the procedure is highly
Scalable smoothing strategies for a geometric multigrid method for the immersed boundary equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhalla, Amneet Pal Singh; Knepley, Matthew G.; Adams, Mark F.
2016-12-20
The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable semi-implicit IB method under Stokes flow conditions; however, this solver methodology used a Vanka-type smoother that presented limited opportunities for parallelization. This work extends this Stokes-IB solver methodology by developing smoothing techniques that are suitable for parallel implementation. Specifically,more » we demonstrate that an additive version of the Vanka smoother can yield an effective multigrid preconditioner for the Stokes-IB equations, and we introduce an efficient Schur complement-based smoother that is also shown to be effective for the Stokes-IB equations. We investigate the performance of these solvers for a broad range of material stiffnesses, both for Stokes flows and flows at nonzero Reynolds numbers, and for thick and thin structural models. We show here that linear solver performance degrades with increasing Reynolds number and material stiffness, especially for thin interface cases. Nonetheless, the proposed approaches promise to yield effective solution algorithms, especially at lower Reynolds numbers and at modest-to-high elastic stiffnesses.« less
NASA Astrophysics Data System (ADS)
Moghaderi, Hamid; Dehghan, Mehdi; Donatelli, Marco; Mazza, Mariarosa
2017-12-01
Fractional diffusion equations (FDEs) are a mathematical tool used for describing some special diffusion phenomena arising in many different applications like porous media and computational finance. In this paper, we focus on a two-dimensional space-FDE problem discretized by means of a second order finite difference scheme obtained as combination of the Crank-Nicolson scheme and the so-called weighted and shifted Grünwald formula. By fully exploiting the Toeplitz-like structure of the resulting linear system, we provide a detailed spectral analysis of the coefficient matrix at each time step, both in the case of constant and variable diffusion coefficients. Such a spectral analysis has a very crucial role, since it can be used for designing fast and robust iterative solvers. In particular, we employ the obtained spectral information to define a Galerkin multigrid method based on the classical linear interpolation as grid transfer operator and damped-Jacobi as smoother, and to prove the linear convergence rate of the corresponding two-grid method. The theoretical analysis suggests that the proposed grid transfer operator is strong enough for working also with the V-cycle method and the geometric multigrid. On this basis, we introduce two computationally favourable variants of the proposed multigrid method and we use them as preconditioners for Krylov methods. Several numerical results confirm that the resulting preconditioning strategies still keep a linear convergence rate.
Fully anisotropic 3-D EM modelling on a Lebedev grid with a multigrid pre-conditioner
NASA Astrophysics Data System (ADS)
Jaysaval, Piyoosh; Shantsev, Daniil V.; de la Kethulle de Ryhove, Sébastien; Bratteland, Tarjei
2016-12-01
We present a numerical algorithm for 3-D electromagnetic (EM) simulations in conducting media with general electric anisotropy. The algorithm is based on the finite-difference discretization of frequency-domain Maxwell's equations on a Lebedev grid, in which all components of the electric field are collocated but half a spatial step staggered with respect to the magnetic field components, which also are collocated. This leads to a system of linear equations that is solved using a stabilized biconjugate gradient method with a multigrid preconditioner. We validate the accuracy of the numerical results for layered and 3-D tilted transverse isotropic (TTI) earth models representing typical scenarios used in the marine controlled-source EM method. It is then demonstrated that not taking into account the full anisotropy of the conductivity tensor can lead to misleading inversion results. For synthetic data corresponding to a 3-D model with a TTI anticlinal structure, a standard vertical transverse isotropic (VTI) inversion is not able to image a resistor, while for a 3-D model with a TTI synclinal structure it produces a false resistive anomaly. However, if the VTI forward solver used in the inversion is replaced by the proposed TTI solver with perfect knowledge of the strike and dip of the dipping structures, the resulting resistivity images become consistent with the true models.
Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R
2016-12-01
Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby "cleaning" such data for future modeling. 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data [heart rate (HR), respiratory rate (RR), peripheral arterial oxygen saturation (SpO 2 ) at 1/20 Hz, and noninvasive oscillometric blood pressure (BP)]. Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. Block 1 yielded 812 VS events, with 214 (26 %) judged by experts as artifact (RR 43 %, SpO 2 40 %, BP 15 %, HR 2 %). ML algorithms applied to the Block 1 training/cross-validation set (tenfold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO 2 . Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO 2 . ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building.
Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R.
2015-01-01
PURPOSE Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby “cleaning” such data for future modeling. METHODS 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data (heart rate [HR], respiratory rate [RR], peripheral arterial oxygen saturation [SpO2] at 1/20Hz., and noninvasive oscillometric blood pressure [BP]) Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. RESULTS Block 1 yielded 812 VS events, with 214 (26%) judged by experts as artifact (RR 43%, SpO2 40%, BP 15%, HR 2%). ML algorithms applied to the Block 1 training/cross-validation set (10-fold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO2. Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO2). CONCLUSIONS ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building. PMID:26438655
Development of the PARVMEC Code for Rapid Analysis of 3D MHD Equilibrium
NASA Astrophysics Data System (ADS)
Seal, Sudip; Hirshman, Steven; Cianciosa, Mark; Wingen, Andreas; Unterberg, Ezekiel; Wilcox, Robert; ORNL Collaboration
2015-11-01
The VMEC three-dimensional (3D) MHD equilibrium has been used extensively for designing stellarator experiments and analyzing experimental data in such strongly 3D systems. Recent applications of VMEC include 2D systems such as tokamaks (in particular, the D3D experiment), where application of very small (delB/B ~ 10-3) 3D resonant magnetic field perturbations render the underlying assumption of axisymmetry invalid. In order to facilitate the rapid analysis of such equilibria (for example, for reconstruction purposes), we have undertaken the task of parallelizing the VMEC code (PARVMEC) to produce a scalable and temporally rapidly convergent equilibrium code for use on parallel distributed memory platforms. The parallelization task naturally splits into three distinct parts 1) radial surfaces in the fixed-boundary part of the calculation; 2) two 2D angular meshes needed to compute the Green's function integrals over the plasma boundary for the free-boundary part of the code; and 3) block tridiagonal matrix needed to compute the full (3D) pre-conditioner near the final equilibrium state. Preliminary results show that scalability is achieved for tasks 1 and 3, with task 2 still nearing completion. The impact of this work on the rapid reconstruction of D3D plasmas using PARVMEC in the V3FIT code will be discussed. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
Wu, S.-S.; Wang, L.; Qiu, X.
2008-01-01
This article presents a deterministic model for sub-block-level population estimation based on the total building volumes derived from geographic information system (GIS) building data and three census block-level housing statistics. To assess the model, we generated artificial blocks by aggregating census block areas and calculating the respective housing statistics. We then applied the model to estimate populations for sub-artificial-block areas and assessed the estimates with census populations of the areas. Our analyses indicate that the average percent error of population estimation for sub-artificial-block areas is comparable to those for sub-census-block areas of the same size relative to associated blocks. The smaller the sub-block-level areas, the higher the population estimation errors. For example, the average percent error for residential areas is approximately 0.11 percent for 100 percent block areas and 35 percent for 5 percent block areas.
Geometry modeling and multi-block grid generation for turbomachinery configurations
NASA Technical Reports Server (NTRS)
Shih, Ming H.; Soni, Bharat K.
1992-01-01
An interactive 3D grid generation code, Turbomachinery Interactive Grid genERation (TIGER), was developed for general turbomachinery configurations. TIGER features the automatic generation of multi-block structured grids around multiple blade rows for either internal, external, or internal-external turbomachinery flow fields. Utilization of the Bezier's curves achieves a smooth grid and better orthogonality. TIGER generates the algebraic grid automatically based on geometric information provided by its built-in pseudo-AI algorithm. However, due to the large variation of turbomachinery configurations, this initial grid may not always be as good as desired. TIGER therefore provides graphical user interactions during the process which allow the user to design, modify, as well as manipulate the grid, including the capability of elliptic surface grid generation.
Multi-Functional Macromers for Hydrogel Design in Biomedical Engineering and Regenerative Medicine
Hacker, Michael C.; Nawaz, Hafiz Awais
2015-01-01
Contemporary biomaterials are expected to provide tailored mechanical, biological and structural cues to encapsulated or invading cells in regenerative applications. In addition, the degradative properties of the material also have to be adjustable to the desired application. Oligo- or polymeric building blocks that can be further cross-linked into hydrogel networks, here addressed as macromers, appear as the prime option to assemble gels with the necessary degrees of freedom in the adjustment of the mentioned key parameters. Recent developments in the design of multi-functional macromers with two or more chemically different types of functionalities are summarized and discussed in this review illustrating recent trends in the development of advanced hydrogel building blocks for regenerative applications. PMID:26610468
Multi-Functional Macromers for Hydrogel Design in Biomedical Engineering and Regenerative Medicine.
Hacker, Michael C; Nawaz, Hafiz Awais
2015-11-19
Contemporary biomaterials are expected to provide tailored mechanical, biological and structural cues to encapsulated or invading cells in regenerative applications. In addition, the degradative properties of the material also have to be adjustable to the desired application. Oligo- or polymeric building blocks that can be further cross-linked into hydrogel networks, here addressed as macromers, appear as the prime option to assemble gels with the necessary degrees of freedom in the adjustment of the mentioned key parameters. Recent developments in the design of multi-functional macromers with two or more chemically different types of functionalities are summarized and discussed in this review illustrating recent trends in the development of advanced hydrogel building blocks for regenerative applications.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Numerical study of supersonic combustors by multi-block grids with mismatched interfaces
NASA Technical Reports Server (NTRS)
Moon, Young J.
1990-01-01
A three dimensional, finite rate chemistry, Navier-Stokes code was extended to a multi-block code with mismatched interface for practical calculations of supersonic combustors. To ensure global conservation, a conservative algorithm was used for the treatment of mismatched interfaces. The extended code was checked against one test case, i.e., a generic supersonic combustor with transverse fuel injection, examining solution accuracy, convergence, and local mass flux error. After testing, the code was used to simulate the chemically reacting flow fields in a scramjet combustor with parallel fuel injectors (unswept and swept ramps). Computational results were compared with experimental shadowgraph and pressure measurements. Fuel-air mixing characteristics of the unswept and swept ramps were compared and investigated.
NASA Astrophysics Data System (ADS)
Nagarajan, S. G.; Srinivasan, M.; Aravinth, K.; Ramasamy, P.
2018-04-01
Transient simulation has been carried out for analyzing the heat transfer properties of Directional Solidification (DS) furnace. The simulation results revealed that the additional heat exchanger block under the bottom insulation on the DS furnace has enhanced the control of solidification of the silicon melt. Controlled Heat extraction rate during the solidification of silicon melt is requisite for growing good quality ingots which has been achieved by the additional heat exchanger block. As an additional heat exchanger block, the water circulating plate has been placed under the bottom insulation. The heat flux analysis of DS system and the temperature distribution studies of grown ingot confirm that the established additional heat exchanger block on the DS system gives additional benefit to the mc-Si ingot.
Improved 3-D turbomachinery CFD algorithm
NASA Technical Reports Server (NTRS)
Janus, J. Mark; Whitfield, David L.
1988-01-01
The building blocks of a computer algorithm developed for the time-accurate flow analysis of rotating machines are described. The flow model is a finite volume method utilizing a high resolution approximate Riemann solver for interface flux definitions. This block LU implicit numerical scheme possesses apparent unconditional stability. Multi-block composite gridding is used to orderly partition the field into a specified arrangement. Block interfaces, including dynamic interfaces, are treated such as to mimic interior block communication. Special attention is given to the reduction of in-core memory requirements by placing the burden on secondary storage media. Broad applicability is implied, although the results presented are restricted to that of an even blade count configuration. Several other configurations are presently under investigation, the results of which will appear in subsequent publications.
The iMars web-GIS - spatio-temporal data queries and single image web map services
NASA Astrophysics Data System (ADS)
Walter, S. H. G.; Steikert, R.; Schreiner, B.; Sidiropoulos, P.; Tao, Y.; Muller, J.-P.; Putry, A. R. D.; van Gasselt, S.
2017-09-01
We introduce a new approach for a system dedicated to planetary surface change detection by simultaneous visualisation of single-image time series in a multi-temporal context. In the context of the EU FP-7 iMars project we process and ingest vast amounts of automatically co-registered (ACRO) images. The base of the co-registration are the high precision HRSC multi-orbit quadrangle image mosaics, which are based on bundle-block-adjusted multi-orbit HRSC DTMs.
Supramolecular Lego assembly towards three-dimensional multi-responsive hydrogels.
Ma, Chunxin; Li, Tiefeng; Zhao, Qian; Yang, Xuxu; Wu, Jingjun; Luo, Yingwu; Xie, Tao
2014-08-27
Inspired by the assembly of Lego toys, hydrogel building blocks with heterogeneous responsiveness are assembled utilizing macroscopic supramolecular recognition as the adhesion force. The Lego hydrogel provides 3D transformation upon pH variation. After disassembly of the building blocks by changing the oxidation state, they can be re-assembled into a completely new shape. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Two-dimensional honeycomb network through sequence-controlled self-assembly of oligopeptides.
Abb, Sabine; Harnau, Ludger; Gutzler, Rico; Rauschenbach, Stephan; Kern, Klaus
2016-01-12
The sequence of a peptide programs its self-assembly and hence the expression of specific properties through non-covalent interactions. A large variety of peptide nanostructures has been designed employing different aspects of these non-covalent interactions, such as dispersive interactions, hydrogen bonding or ionic interactions. Here we demonstrate the sequence-controlled fabrication of molecular nanostructures using peptides as bio-organic building blocks for two-dimensional (2D) self-assembly. Scanning tunnelling microscopy reveals changes from compact or linear assemblies (angiotensin I) to long-range ordered, chiral honeycomb networks (angiotensin II) as a result of removal of steric hindrance by sequence modification. Guided by our observations, molecular dynamic simulations yield atomistic models for the elucidation of interpeptide-binding motifs. This new approach to 2D self-assembly on surfaces grants insight at the atomic level that will enable the use of oligo- and polypeptides as large, multi-functional bio-organic building blocks, and opens a new route towards rationally designed, bio-inspired surfaces.
Programming methodology for a general purpose automation controller
NASA Technical Reports Server (NTRS)
Sturzenbecker, M. C.; Korein, J. U.; Taylor, R. H.
1987-01-01
The General Purpose Automation Controller is a multi-processor architecture for automation programming. A methodology has been developed whose aim is to simplify the task of programming distributed real-time systems for users in research or manufacturing. Programs are built by configuring function blocks (low-level computations) into processes using data flow principles. These processes are activated through the verb mechanism. Verbs are divided into two classes: those which support devices, such as robot joint servos, and those which perform actions on devices, such as motion control. This programming methodology was developed in order to achieve the following goals: (1) specifications for real-time programs which are to a high degree independent of hardware considerations such as processor, bus, and interconnect technology; (2) a component approach to software, so that software required to support new devices and technologies can be integrated by reconfiguring existing building blocks; (3) resistance to error and ease of debugging; and (4) a powerful command language interface.
Architecting the Finite Element Method Pipeline for the GPU.
Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T
2014-02-01
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
Effects of high-frequency damping on iterative convergence of implicit viscous solver
NASA Astrophysics Data System (ADS)
Nishikawa, Hiroaki; Nakashima, Yoshitaka; Watanabe, Norihiko
2017-11-01
This paper discusses effects of high-frequency damping on iterative convergence of an implicit defect-correction solver for viscous problems. The study targets a finite-volume discretization with a one parameter family of damped viscous schemes. The parameter α controls high-frequency damping: zero damping with α = 0, and larger damping for larger α (> 0). Convergence rates are predicted for a model diffusion equation by a Fourier analysis over a practical range of α. It is shown that the convergence rate attains its minimum at α = 1 on regular quadrilateral grids, and deteriorates for larger values of α. A similar behavior is observed for regular triangular grids. In both quadrilateral and triangular grids, the solver is predicted to diverge for α smaller than approximately 0.5. Numerical results are shown for the diffusion equation and the Navier-Stokes equations on regular and irregular grids. The study suggests that α = 1 and 4/3 are suitable values for robust and efficient computations, and α = 4 / 3 is recommended for the diffusion equation, which achieves higher-order accuracy on regular quadrilateral grids. Finally, a Jacobian-Free Newton-Krylov solver with the implicit solver (a low-order Jacobian approximately inverted by a multi-color Gauss-Seidel relaxation scheme) used as a variable preconditioner is recommended for practical computations, which provides robust and efficient convergence for a wide range of α.
NASA Astrophysics Data System (ADS)
Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.
2017-12-01
We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.
NASA Technical Reports Server (NTRS)
Craig, Roy R., Jr.
1987-01-01
The major accomplishments of this research are: (1) the refinement and documentation of a multi-input, multi-output modal parameter estimation algorithm which is applicable to general linear, time-invariant dynamic systems; (2) the development and testing of an unsymmetric block-Lanzcos algorithm for reduced-order modeling of linear systems with arbitrary damping; and (3) the development of a control-structure-interaction (CSI) test facility.
Sinogram restoration for ultra-low-dose x-ray multi-slice helical CT by nonparametric regression
NASA Astrophysics Data System (ADS)
Jiang, Lu; Siddiqui, Khan; Zhu, Bin; Tao, Yang; Siegel, Eliot
2007-03-01
During the last decade, x-ray computed tomography (CT) has been applied to screen large asymptomatic smoking and nonsmoking populations for early lung cancer detection. Because a larger population will be involved in such screening exams, more and more attention has been paid to studying low-dose, even ultra-low-dose x-ray CT. However, reducing CT radiation exposure will increase noise level in the sinogram, thereby degrading the quality of reconstructed CT images as well as causing more streak artifacts near the apices of the lung. Thus, how to reduce the noise levels and streak artifacts in the low-dose CT images is becoming a meaningful topic. Since multi-slice helical CT has replaced conventional stop-and-shoot CT in many clinical applications, this research mainly focused on the noise reduction issue in multi-slice helical CT. The experiment data were provided by Siemens SOMATOM Sensation 16-Slice helical CT. It included both conventional CT data acquired under 120 kvp voltage and 119 mA current and ultra-low-dose CT data acquired under 120 kvp and 10 mA protocols. All other settings are the same as that of conventional CT. In this paper, a nonparametric smoothing method with thin plate smoothing splines and the roughness penalty was proposed to restore the ultra-low-dose CT raw data. Each projection frame was firstly divided into blocks, and then the 2D data in each block was fitted to a thin-plate smoothing splines' surface via minimizing a roughness-penalized least squares objective function. By doing so, the noise in each ultra-low-dose CT projection was reduced by leveraging the information contained not only within each individual projection profile, but also among nearby profiles. Finally the restored ultra-low-dose projection data were fed into standard filtered back projection (FBP) algorithm to reconstruct CT images. The rebuilt results as well as the comparison between proposed approach and traditional method were given in the results and discussions section, and showed effectiveness of proposed thin-plate based nonparametric regression method.
Technology-based design and scaling for RTGs for space exploration in the 100 W range
NASA Astrophysics Data System (ADS)
Summerer, Leopold; Pierre Roux, Jean; Pustovalov, Alexey; Gusev, Viacheslav; Rybkin, Nikolai
2011-04-01
This paper presents the results of a study on design considerations for a 100 W radioisotope thermo-electric generator (RTG). Special emphasis has been put on designing a modular, multi-purpose system with high overall TRL levels and making full use of the extensive Russian heritage in the design of radioisotope power systems. The modular approach allowed insight into the scaling of such RTGs covering the electric power range from 50 to 200 W e (EoL). The retained concept is based on a modular thermal block structure, a radiative inner-RTG heat transfer and using a two-stage thermo-electric conversion system.
Lithosphere structure of the west Qinling orogenic belt revealed by deep seismic reflection profile
NASA Astrophysics Data System (ADS)
Wang, H.
2009-12-01
The west Qinling orogen located in the northeastern margin of the Qinghai-Tibet plateau, is transformation zone between the N-S-trending and E-W-trending tectonics in the Chinese continent. Further study of the fine crust structure of the west Qinling orogen and its relationships with surrounding basins have very important significance for understanding tectonic response of the northeastern margin of the plateau about collision convergence of the Indian block and Asian block and learning formation and evolution of the plateau. In 2009, we reprocessed the data of the Tangke-Hezuo deep seismic reflection profiles collected in 2004 across the west Qinling orogen and the northern Songpan block. The new results show the lithosphere fine structure of the west Qinling orogen. Reflection features indicate that an interface at 6.0-7.0s (TWT) divided the crust into the upper and lower crust, whose structural style and deformation are totally different. Integrating geological data, we deduce that the interface at 6.0-7.0s (depth with 18-21 km) was the basement detachment, which made deformation decoupled of the upper and lower crust. The multi-layered reflections in the upper crust reveal the sedimentary covers of the west Qinling orogen, disclose the thickness of the various structure layer and deformation degree, and provide a basis for the prospective evaluation of a multi-metallic mineral and energy exploration. The north dipping strong reflection characteristics of the lower crust in the west Qinling orogen constituted imbricate structure, such imbricate structural features provide seismology evidence for researching the west Qinling thrusting toward the northern Songpan block, and have great significance for studying formation and evolution of the Songpan-Garze structure. Moho reflections are observed around 17.0-17.2s, characterized by nearly horizontal reflections, which implies the west Qinling orogen underwent an intense extension post orogeny caused the lithosphere extensional thinning formed a nearly level Moho reflections. The study was financed by National Natural Science Foundation of china (No. 40830316 and 40604010),the Basic outlay of scientific research work from Ministry of Science and Technology of the People’s Republic of China and SINOPPROBE-02.
Wheeler, David C; Czarnota, Jenna; Jones, Resa M
2017-01-01
Socioeconomic status (SES) is often considered a risk factor for health outcomes. SES is typically measured using individual variables of educational attainment, income, housing, and employment variables or a composite of these variables. Approaches to building the composite variable include using equal weights for each variable or estimating the weights with principal components analysis or factor analysis. However, these methods do not consider the relationship between the outcome and the SES variables when constructing the index. In this project, we used weighted quantile sum (WQS) regression to estimate an area-level SES index and its effect in a model of colonoscopy screening adherence in the Minnesota-Wisconsin Metropolitan Statistical Area. We considered several specifications of the SES index including using different spatial scales (e.g., census block group-level, tract-level) for the SES variables. We found a significant positive association (odds ratio = 1.17, 95% CI: 1.15-1.19) between the SES index and colonoscopy adherence in the best fitting model. The model with the best goodness-of-fit included a multi-scale SES index with 10 variables at the block group-level and one at the tract-level, with home ownership, race, and income among the most important variables. Contrary to previous index construction, our results were not consistent with an assumption of equal importance of variables in the SES index when explaining colonoscopy screening adherence. Our approach is applicable in any study where an SES index is considered as a variable in a regression model and the weights for the SES variables are not known in advance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jing Yanfei, E-mail: yanfeijing@uestc.edu.c; Huang Tingzhu, E-mail: tzhuang@uestc.edu.c; Duan Yong, E-mail: duanyong@yahoo.c
This study is mainly focused on iterative solutions with simple diagonal preconditioning to two complex-valued nonsymmetric systems of linear equations arising from a computational chemistry model problem proposed by Sherry Li of NERSC. Numerical experiments show the feasibility of iterative methods to some extent when applied to the problems and reveal the competitiveness of our recently proposed Lanczos biconjugate A-orthonormalization methods to other classic and popular iterative methods. By the way, experiment results also indicate that application specific preconditioners may be mandatory and required for accelerating convergence.
Weighted graph based ordering techniques for preconditioned conjugate gradient methods
NASA Technical Reports Server (NTRS)
Clift, Simon S.; Tang, Wei-Pai
1994-01-01
We describe the basis of a matrix ordering heuristic for improving the incomplete factorization used in preconditioned conjugate gradient techniques applied to anisotropic PDE's. Several new matrix ordering techniques, derived from well-known algorithms in combinatorial graph theory, which attempt to implement this heuristic, are described. These ordering techniques are tested against a number of matrices arising from linear anisotropic PDE's, and compared with other matrix ordering techniques. A variation of RCM is shown to generally improve the quality of incomplete factorization preconditioners.
Domain decomposition in time for PDE-constrained optimization
Barker, Andrew T.; Stoll, Martin
2015-08-28
Here, PDE-constrained optimization problems have a wide range of applications, but they lead to very large and ill-conditioned linear systems, especially if the problems are time dependent. In this paper we outline an approach for dealing with such problems by decomposing them in time and applying an additive Schwarz preconditioner in time, so that we can take advantage of parallel computers to deal with the very large linear systems. We then illustrate the performance of our method on a variety of problems.
CRASH: A BLOCK-ADAPTIVE-MESH CODE FOR RADIATIVE SHOCK HYDRODYNAMICS-IMPLEMENTATION AND VERIFICATION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van der Holst, B.; Toth, G.; Sokolov, I. V.
We describe the Center for Radiative Shock Hydrodynamics (CRASH) code, a block-adaptive-mesh code for multi-material radiation hydrodynamics. The implementation solves the radiation diffusion model with a gray or multi-group method and uses a flux-limited diffusion approximation to recover the free-streaming limit. Electrons and ions are allowed to have different temperatures and we include flux-limited electron heat conduction. The radiation hydrodynamic equations are solved in the Eulerian frame by means of a conservative finite-volume discretization in either one-, two-, or three-dimensional slab geometry or in two-dimensional cylindrical symmetry. An operator-split method is used to solve these equations in three substeps: (1)more » an explicit step of a shock-capturing hydrodynamic solver; (2) a linear advection of the radiation in frequency-logarithm space; and (3) an implicit solution of the stiff radiation diffusion, heat conduction, and energy exchange. We present a suite of verification test problems to demonstrate the accuracy and performance of the algorithms. The applications are for astrophysics and laboratory astrophysics. The CRASH code is an extension of the Block-Adaptive Tree Solarwind Roe Upwind Scheme (BATS-R-US) code with a new radiation transfer and heat conduction library and equation-of-state and multi-group opacity solvers. Both CRASH and BATS-R-US are part of the publicly available Space Weather Modeling Framework.« less
Ammari, Faten; Bassel, Léna; Ferrier, Catherine; Lacanette, Delphine; Chapoulie, Rémy; Bousquet, Bruno
2016-10-01
In this study, multi-block analysis was applied for the first time to LIBS spectra provided by a portable LIBS system (IVEA Solution, France) equipped with three compact Czerny-Turner spectrometers covering the spectral ranges 200-397nm, 398-571nm and 572-1000nm. 41 geological samples taken from a laboratory-cave situated in the "Vézère valley", an area rich with prehistoric sites and decorated caves listed as a UNESCO world heritage in the south west of France, were analyzed. They were composed of limestone and clay considered as underlying supports and of two types of alterations referred as moonmilk and coralloid. Common Components and Specific Weights Analysis (CCSWA) allowed sorting moonmilk and coralloid samples. The loadings revealed higher amounts of magnesium, silicon, aluminum and strontium in coralloids and the saliences emphasized that among the three spectrometers installed in the LIBS instrument used in this work; that covering the range 572-1000nm was less contributive. This new approach for processing LIBS data not only provides good results for sorting geological materials but also clearly reveals which spectral range contains most of the information. This specific advantage of multi-block analysis could lead for some applications to simplify the design and to reduce the size of LIBS instruments. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Grant, K. D.; Panas, M.
2016-12-01
NOAA and NASA are jointly acquiring the next-generation civilian weather satellite system: the Joint Polar Satellite System (JPSS). JPSS replaced the afternoon orbit component and ground processing of NOAA's old POES system. JPSS satellites carry sensors that collect meteorological, oceanographic, climatological, and solar-geophysical observations of the earth, atmosphere, and space. The ground processing system for JPSS is known as the JPSS Common Ground System (JPSS CGS). Developed and maintained by Raytheon Intelligence, Information and Services (IIS), the CGS is a globally distributed, multi-mission system serving NOAA, NASA and their national and international partners. The CGS has demonstrated its scalability and flexibility to incorporate multiple missions efficiently and with minimal cost, schedule and risk, while strengthening global partnerships in weather and environmental monitoring. The CGS architecture has been upgraded to Block 2.0 to satisfy several key objectives, including: "operationalizing" the first satellite, Suomi NPP, which originally was a risk reduction mission; leveraging lessons learned in multi-mission support, taking advantage of newer, more reliable and efficient technologies and satisfying constraints due of the continually evolving budgetary environment. To ensure the CGS meets these needs, we have developed 48 Technical Performance Measures (TPMs) across 9 categories: Data Availability, Data Latency, Operational Availability, Margin, Scalability, Situational Awareness, Transition (between environments and sites), WAN Efficiency, and Data Recovery Processing. This paper will provide an overview of the CGS Block 2.0 architecture, with particular focus on the 9 TPM categories listed above. We will describe how we ensure the deployed architecture meets these TPMs to satisfy our multi-mission objectives with the deployment of Block 2.0.
Arterial and venous plasma levels of bupivacaine following peripheral nerve blocks.
Moore, D C; Mather, L E; Bridenbaugh, L D; Balfour, R I; Lysons, D F; Horton, W G
1976-01-01
Mean arterial plasma (MAP) and peripheral mean venous plasma (MVP) levels of bupivacaine were ascertained in 3 groups of 10 patients each for: (1) intercostal nerve block, 400 mg; (2) block of the sciatic, femoral, and lateral femoral cutaneous nerves, with or without block of the obturator nerve, 400 mg; and (3) supraclavicular brachial plexus block, 300 mg. MAP levels were consistently higher than simultaneously sampled MVP levels, the highest levels occurring from bilateral intercostal nerve block. No evidence of systemic toxicity was observed. The results suggest that bupivacaine has a much wider margin of safety in humans than is now stated.
Automatic Orientation of Large Blocks of Oblique Images
NASA Astrophysics Data System (ADS)
Rupnik, E.; Nex, F.; Remondino, F.
2013-05-01
Nowadays, multi-camera platforms combining nadir and oblique cameras are experiencing a revival. Due to their advantages such as ease of interpretation, completeness through mitigation of occluding areas, as well as system accessibility, they have found their place in numerous civil applications. However, automatic post-processing of such imagery still remains a topic of research. Configuration of cameras poses a challenge on the traditional photogrammetric pipeline used in commercial software and manual measurements are inevitable. For large image blocks it is certainly an impediment. Within theoretical part of the work we review three common least square adjustment methods and recap on possible ways for a multi-camera system orientation. In the practical part we present an approach that successfully oriented a block of 550 images acquired with an imaging system composed of 5 cameras (Canon Eos 1D Mark III) with different focal lengths. Oblique cameras are rotated in the four looking directions (forward, backward, left and right) by 45° with respect to the nadir camera. The workflow relies only upon open-source software: a developed tool to analyse image connectivity and Apero to orient the image block. The benefits of the connectivity tool are twofold: in terms of computational time and success of Bundle Block Adjustment. It exploits the georeferenced information provided by the Applanix system in constraining feature point extraction to relevant images only, and guides the concatenation of images during the relative orientation. Ultimately an absolute transformation is performed resulting in mean re-projection residuals equal to 0.6 pix.
NASA Technical Reports Server (NTRS)
Von der Porten, Paul; Ahmad, Naeem; Hawkins, Matt; Fill, Thomas
2018-01-01
NASA is currently building the Space Launch System (SLS) Block-1 launch vehicle for the Exploration Mission 1 (EM-1) test flight. NASA is also currently designing the next evolution of SLS, the Block-1B. The Block-1 and Block-1B vehicles will use the Powered Explicit Guidance (PEG) algorithm (of Space Shuttle heritage) for closed loop guidance. To accommodate vehicle capabilities and design for future evolutions of SLS, modifications were made to PEG for Block-1 to handle multi-phase burns, provide PEG updated propulsion information, and react to a core stage engine out. In addition, due to the relatively low thrust-to-weight ratio of the Exploration Upper Stage (EUS) and EUS carrying out Lunar Vicinity and Earth Escape missions, certain enhancements to the Block-1 PEG algorithm are needed to perform Block-1B missions to account for long burn arcs and target translunar and hyperbolic orbits. This paper describes the design and implementation of modifications to the Block-1 PEG algorithm as compared to Space Shuttle. Furthermore, this paper illustrates challenges posed by the Block-1B vehicle and the required PEG enhancements. These improvements make PEG capable for use on the SLS Block-1B vehicle as part of the Guidance, Navigation, and Control (GN&C) System.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
The biometric recognition on contactless multi-spectrum finger images
NASA Astrophysics Data System (ADS)
Kang, Wenxiong; Chen, Xiaopeng; Wu, Qiuxia
2015-01-01
This paper presents a novel multimodal biometric system based on contactless multi-spectrum finger images, which aims to deal with the limitations of unimodal biometrics. The chief merits of the system are the richness of the permissible texture and the ease of data access. We constructed a multi-spectrum instrument to simultaneously acquire three different types of biometrics from a finger: contactless fingerprint, finger vein, and knuckleprint. On the basis of the samples with these characteristics, a moderate database was built for the evaluation of our system. Considering the real-time requirements and the respective characteristics of the three biometrics, the block local binary patterns algorithm was used to extract features and match for the fingerprints and finger veins, while the Oriented FAST and Rotated BRIEF algorithm was applied for knuckleprints. Finally, score-level fusion was performed on the matching results from the aforementioned three types of biometrics. The experiments showed that our proposed multimodal biometric recognition system achieves an equal error rate of 0.109%, which is 88.9%, 94.6%, and 89.7% lower than the individual fingerprint, knuckleprint, and finger vein recognitions, respectively. Nevertheless, our proposed system also satisfies the real-time requirements of the applications.
Schrank, Simone; Jedinger, Nicole; Wu, Shengqian; Piller, Michael; Roblegg, Eva
2016-07-25
In this work calcium stearate (CaSt) multi-particulates loaded with codeine phosphate (COP) were developed in an attempt to provide extended release (ER) combined with alcohol dose dumping (ADD) resistance. The pellets were prepared via wet/extrusion spheronization and ER characteristics were obtained after fluid bed drying at 30°C. Pore blockers (i.e., xanthan, guar gum and TiO2) were integrated to control the uptake of ethanolic media, the CaSt swelling and consequently, the COP release. While all three pore blockers are insoluble in ethanol, xanthan dissolves, guar gum swells and TiO2 does not interact with water. The incorporation of 10 and 15% TiO2 still provided ER characteristics and yielded ADD resistance in up to 40v% ethanol. The in-vitro data were subjected to PK simulations, which revealed similar codeine plasma levels when the medication is used concomitantly with alcoholic beverages. Taken together the in-vitro and in-silico results demonstrate that the incorporation of appropriate pore blockers presents a promising strategy to provide ADD resistance of multi-particulate systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Conformally encapsulated multi-electrode arrays with seamless insulation
Tabada, Phillipe J.; Shah, Kedar G.; Tolosa, Vanessa; Pannu, Satinderall S.; Tooker, Angela; Delima, Terri; Sheth, Heeral; Felix, Sarah
2016-11-22
Thin-film multi-electrode arrays (MEA) having one or more electrically conductive beams conformally encapsulated in a seamless block of electrically insulating material, and methods of fabricating such MEAs using reproducible, microfabrication processes. One or more electrically conductive traces are formed on scaffold material that is subsequently removed to suspend the traces over a substrate by support portions of the trace beam in contact with the substrate. By encapsulating the suspended traces, either individually or together, with a single continuous layer of an electrically insulating material, a seamless block of electrically insulating material is formed that conforms to the shape of the trace beam structure, including any trace backings which provide suspension support. Electrical contacts, electrodes, or leads of the traces are exposed from the encapsulated trace beam structure by removing the substrate.
Corey, John A.
1985-01-01
A multi-cylinder hot gas engine having an equal angle, V-shaped engine block in which two banks of parallel, equal length, equally sized cylinders are formed together with annular regenerator/cooler units surrounding each cylinder, and wherein the pistons are connected to a single crankshaft. The hot gas engine further includes an annular heater head disposed around a central circular combustor volume having a new balanced-flow hot-working-fluid manifold assembly that provides optimum balanced flow of the working fluid through the heater head working fluid passageways which are connected between each of the cylinders and their respective associated annular regenerator units. This balanced flow provides even heater head temperatures and, therefore, maximum average working fluid temperature for best operating efficiency with the use of a single crankshaft V-shaped engine block.
NASA Astrophysics Data System (ADS)
Feng, Wenqiang; Guo, Zhenlin; Lowengrub, John S.; Wise, Steven M.
2018-01-01
We present a mass-conservative full approximation storage (FAS) multigrid solver for cell-centered finite difference methods on block-structured, locally cartesian grids. The algorithm is essentially a standard adaptive FAS (AFAS) scheme, but with a simple modification that comes in the form of a mass-conservative correction to the coarse-level force. This correction is facilitated by the creation of a zombie variable, analogous to a ghost variable, but defined on the coarse grid and lying under the fine grid refinement patch. We show that a number of different types of fine-level ghost cell interpolation strategies could be used in our framework, including low-order linear interpolation. In our approach, the smoother, prolongation, and restriction operations need never be aware of the mass conservation conditions at the coarse-fine interface. To maintain global mass conservation, we need only modify the usual FAS algorithm by correcting the coarse-level force function at points adjacent to the coarse-fine interface. We demonstrate through simulations that the solver converges geometrically, at a rate that is h-independent, and we show the generality of the solver, applying it to several nonlinear, time-dependent, and multi-dimensional problems. In several tests, we show that second-order asymptotic (h → 0) convergence is observed for the discretizations, provided that (1) at least linear interpolation of the ghost variables is employed, and (2) the mass conservation corrections are applied to the coarse-level force term.
TIGGERC: Turbomachinery Interactive Grid Generator for 2-D Grid Applications and Users Guide
NASA Technical Reports Server (NTRS)
Miller, David P.
1994-01-01
A two-dimensional multi-block grid generator has been developed for a new design and analysis system for studying multiple blade-row turbomachinery problems. TIGGERC is a mouse driven, interactive grid generation program which can be used to modify boundary coordinates and grid packing and generates surface grids using a hyperbolic tangent or algebraic distribution of grid points on the block boundaries. The interior points of each block grid are distributed using a transfinite interpolation approach. TIGGERC can generate a blocked axisymmetric H-grid, C-grid, I-grid or O-grid for studying turbomachinery flow problems. TIGGERC was developed for operation on Silicon Graphics workstations. Detailed discussion of the grid generation methodology, menu options, operational features and sample grid geometries are presented.
The use of functional chemical-protein associations to identify multi-pathway renoprotectants.
Xu, Jia; Meng, Kexin; Zhang, Rui; Yang, He; Liao, Chang; Zhu, Wenliang; Jiao, Jundong
2014-01-01
Typically, most nephropathies can be categorized as complex human diseases in which the cumulative effect of multiple minor genes, combined with environmental and lifestyle factors, determines the disease phenotype. Thus, multi-target drugs would be more likely to facilitate comprehensive renoprotection than single-target agents. In this study, functional chemical-protein association analysis was performed to retrieve multi-target drugs of high pathway wideness from the STITCH 3.1 database. Pathway wideness of a drug evaluated the efficiency of regulation of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in quantity. We identified nine experimentally validated renoprotectants that exerted remarkable impact on KEGG pathways by targeting a limited number of proteins. We selected curcumin as an illustrative compound to display the advantage of multi-pathway drugs on renoprotection. We compared curcumin with hemin, an agonist of heme oxygenase-1 (HO-1), which significantly affects only one KEGG pathway, porphyrin and chlorophyll metabolism (adjusted p = 1.5×10-5). At the same concentration (10 µM), both curcumin and hemin equivalently mitigated oxidative stress in H2O2-treated glomerular mesangial cells. The benefit of using hemin was derived from its agonistic effect on HO-1, providing relief from oxidative stress. Selective inhibition of HO-1 completely blocked the action of hemin but not that of curcumin, suggesting simultaneous multi-pathway intervention by curcumin. Curcumin also increased cellular autophagy levels, enhancing its protective effect; however, hemin had no effects. Based on the fact that the dysregulation of multiple pathways is implicated in the etiology of complex diseases, we proposed a feasible method for identifying multi-pathway drugs from compounds with validated targets. Our efforts will help identify multi-pathway agents capable of providing comprehensive protection against renal injuries.
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
Well-posed and stable transmission problems
NASA Astrophysics Data System (ADS)
Nordström, Jan; Linders, Viktor
2018-07-01
We introduce the notion of a transmission problem to describe a general class of problems where different dynamics are coupled in time. Well-posedness and stability are analysed for continuous and discrete problems using both strong and weak formulations, and a general transmission condition is obtained. The theory is applied to the coupling of fluid-acoustic models, multi-grid implementations, adaptive mesh refinements, multi-block formulations and numerical filtering.
NASA Astrophysics Data System (ADS)
Niu, Xiaoliang; Yuan, Fen; Huang, Shanguo; Guo, Bingli; Gu, Wanyi
2011-12-01
A Dynamic clustering scheme based on coordination of management and control is proposed to reduce network congestion rate and improve the blocking performance of hierarchical routing in Multi-layer and Multi-region intelligent optical network. Its implement relies on mobile agent (MA) technology, which has the advantages of efficiency, flexibility, functional and scalability. The paper's major contribution is to adjust dynamically domain when the performance of working network isn't in ideal status. And the incorporation of centralized NMS and distributed MA control technology migrate computing process to control plane node which releases the burden of NMS and improves process efficiently. Experiments are conducted on Multi-layer and multi-region Simulation Platform for Optical Network (MSPON) to assess the performance of the scheme.
Self-cleaning threaded rod spinneret for high-efficiency needleless electrospinning
NASA Astrophysics Data System (ADS)
Zheng, Gaofeng; Jiang, Jiaxin; Wang, Xiang; Li, Wenwang; Zhong, Weizheng; Guo, Shumin
2018-07-01
High-efficiency production of nanofibers is the key to the application of electrospinning technology. This work focuses on multi-jet electrospinning, in which a threaded rod electrode is utilized as the needless spinneret to achieve high-efficiency production of nanofibers. A slipper block, which fits into and moves through the threaded rod, is designed to transfer polymer solution evenly to the surface of the rod spinneret. The relative motion between the slipper block and the threaded rod electrode promotes the instable fluctuation of the solution surface, thus the rotation of threaded rod electrode decreases the critical voltage for the initial multi-jet ejection and the diameter of nanofibers. The residual solution on the surface of threaded rod is cleaned up by the moving slipper block, showing a great self-cleaning ability, which ensures the stable multi-jet ejection and increases the productivity of nanofibers. Each thread of the threaded rod electrode serves as an independent spinneret, which enhances the electric field strength and constrains the position of the Taylor cone, resulting in high productivity of uniform nanofibers. The diameter of nanofibers decreases with the increase of threaded rod rotation speed, and the productivity increases with the solution flow rate. The rotation of electrode provides an excess force for the ejection of charged jets, which also contributes to the high-efficiency production of nanofibers. The maximum productivity of nanofibers from the threaded rod spinneret is 5-6 g/h, about 250-300 times as high as that from the single-needle spinneret. The self-cleaning threaded rod spinneret is an effective way to realize continuous multi-jet electrospinning, which promotes industrial applications of uniform nanofibrous membrane.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hu, Weiming; Li, Xi; Luo, Wenhan; Zhang, Xiaoqin; Maybank, Stephen; Zhang, Zhongfei
2012-12-01
Object appearance modeling is crucial for tracking objects, especially in videos captured by nonstationary cameras and for reasoning about occlusions between multiple moving objects. Based on the log-euclidean Riemannian metric on symmetric positive definite matrices, we propose an incremental log-euclidean Riemannian subspace learning algorithm in which covariance matrices of image features are mapped into a vector space with the log-euclidean Riemannian metric. Based on the subspace learning algorithm, we develop a log-euclidean block-division appearance model which captures both the global and local spatial layout information about object appearances. Single object tracking and multi-object tracking with occlusion reasoning are then achieved by particle filtering-based Bayesian state inference. During tracking, incremental updating of the log-euclidean block-division appearance model captures changes in object appearance. For multi-object tracking, the appearance models of the objects can be updated even in the presence of occlusions. Experimental results demonstrate that the proposed tracking algorithm obtains more accurate results than six state-of-the-art tracking algorithms.
Dense blocks of energetic ions driven by multi-petawatt lasers
Weng, S. M.; Liu, M.; Sheng, Z. M.; Murakami, M.; Chen, M.; Yu, L. L.; Zhang, J.
2016-01-01
Laser-driven ion accelerators have the advantages of compact size, high density, and short bunch duration over conventional accelerators. Nevertheless, it is still challenging to simultaneously enhance the yield and quality of laser-driven ion beams for practical applications. Here we propose a scheme to address this challenge via the use of emerging multi-petawatt lasers and a density-modulated target. The density-modulated target permits its ions to be uniformly accelerated as a dense block by laser radiation pressure. In addition, the beam quality of the accelerated ions is remarkably improved by embedding the target in a thick enough substrate, which suppresses hot electron refluxing and thus alleviates plasma heating. Particle-in-cell simulations demonstrate that almost all ions in a solid-density plasma of a few microns can be uniformly accelerated to about 25% of the speed of light by a laser pulse at an intensity around 1022 W/cm2. The resulting dense block of energetic ions may drive fusion ignition and more generally create matter with unprecedented high energy density. PMID:26924793
NASA Astrophysics Data System (ADS)
Biss, Matthew; Murphy, Michael; Lieber, Mark
2017-06-01
Experiments were conducted in an effort to qualify a multi-diagnostic characterization procedure for the performance output of a detonator when fired into a poly(methyl methacrylate) (PMMA) witness block. A suite of optical diagnostics were utilized in combination to both bound the shock wave interaction state at the detonator/PMMA interface and characterize the nature of the shock wave decay in PMMA. The diagnostics included the Shock Wave Image Framing Technique (SWIFT), a photocathode tube streak camera, and photonic Doppler velocimetry (PDV). High-precision, optically clear witness blocks permitted dynamic flow visualization of the shock wave in PMMA via focused shadowgraphy. SWIFT- and streak-imaging diagnostics captured the spatiotemporally evolving shock wave, providing a two-dimensional temporally discrete image set and a one-dimensional temporally continuous image, respectively. PDV provided the temporal velocity history of the detonator output along the detonator axis. Through combination of the results obtained, a bound was able to be placed on the interface condition and a more-physical profile representing the shock wave decay in PMMA for an exploding-bridgewire detonator was achieved.
Poston, Brach; Van Gemmert, Arend W.A.; Sharma, Siddharth; Chakrabarti, Somesh; Zavaremi, Shahrzad H.; Stelmach, George
2013-01-01
The minimum variance theory proposes that motor commands are corrupted by signal-dependent noise and smooth trajectories with low noise levels are selected to minimize endpoint error and endpoint variability. The purpose of the study was to determine the contribution of trajectory smoothness to the endpoint accuracy and endpoint variability of rapid multi-joint arm movements. Young and older adults performed arm movements (4 blocks of 25 trials) as fast and as accurately as possible to a target with the right (dominant) arm. Endpoint accuracy and endpoint variability along with trajectory smoothness and error were quantified for each block of trials. Endpoint error and endpoint variance were greater in older adults compared with young adults, but decreased at a similar rate with practice for the two age groups. The greater endpoint error and endpoint variance exhibited by older adults were primarily due to impairments in movement extent control and not movement direction control. The normalized jerk was similar for the two age groups, but was not strongly associated with endpoint error or endpoint variance for either group. However, endpoint variance was strongly associated with endpoint error for both the young and older adults. Finally, trajectory error was similar for both groups and was weakly associated with endpoint error for the older adults. The findings are not consistent with the predictions of the minimum variance theory, but support and extend previous observations that movement trajectories and endpoints are planned independently. PMID:23584101
Multigrid Strategies for Viscous Flow Solvers on Anisotropic Unstructured Meshes
NASA Technical Reports Server (NTRS)
Movriplis, Dimitri J.
1998-01-01
Unstructured multigrid techniques for relieving the stiffness associated with high-Reynolds number viscous flow simulations on extremely stretched grids are investigated. One approach consists of employing a semi-coarsening or directional-coarsening technique, based on the directions of strong coupling within the mesh, in order to construct more optimal coarse grid levels. An alternate approach is developed which employs directional implicit smoothing with regular fully coarsened multigrid levels. The directional implicit smoothing is obtained by constructing implicit lines in the unstructured mesh based on the directions of strong coupling. Both approaches yield large increases in convergence rates over the traditional explicit full-coarsening multigrid algorithm. However, maximum benefits are achieved by combining the two approaches in a coupled manner into a single algorithm. An order of magnitude increase in convergence rate over the traditional explicit full-coarsening algorithm is demonstrated, and convergence rates for high-Reynolds number viscous flows which are independent of the grid aspect ratio are obtained. Further acceleration is provided by incorporating low-Mach-number preconditioning techniques, and a Newton-GMRES strategy which employs the multigrid scheme as a preconditioner. The compounding effects of these various techniques on speed of convergence is documented through several example test cases.
Ertikin, Aysun; Argun, Güldeniz; Mısırlıoğlu, Mesut; Aydın, Murat; Arıkan, Murat; Kadıoğulları, Nihal
2017-10-01
In this study, we aimed to compare axillary brachial plexus block using the two-injection and four-injection techniques assisted with ultrasonography (USG) and nerve stimulator in patients operated for carpal tunnel syndrome with articaine. To evaluate which technique is more effective, we compared the onset time, effectiveness, and duration of block procedures, patient satisfaction, adverse effect of the drug, and complication rates of the motor and sensory blocks. Sixty patients were randomly divided into two groups. A mixture of physiologic serum added to articain with NaHCO 3 (30 mL) was injected into the patients' axilla in both the groups. After the blockage of the musculocutaneous nerve in both the groups, the median nerve in the two-injection group and the median nerve, ulnar nerve, and radial nerve in the four-injection group were blocked. In brachial plexus nerves, sensorial blockage was evaluated with pinprick test, and motor block was evaluated by contraction of the muscles innervated by each nerve. The adverse effects and complications, visual analog scale (VAS) values during the operation, and post-operative patient satisfaction were recorded. Sufficient analgesia and anaesthesia were achieved with no need for an additional local anaesthetics in both the groups. Furthermore, additional sedation requirements were found to be similar in both the groups. A faster rate and a more effective complete block were achieved in more patients from the four-injection group. In the two-injection group, the block could not be achieved for N. radialis in one patient. All other nerves were successfully blocked. Whereas the blockage procedure lasted longer in the four-injection group, the VAS values recorded during the blockage procedure were higher in the four-injection group. No statistical difference was found with regard to patient satisfaction, and no adverse effects and complications were observed in any group. Although the multi-injection method takes more time, it provides faster anaesthesia and more complete blockage than the two-injection method used with articain. The two-injection method can also be used in specific surgery such as for carpal tunnel syndrome, as an alternative to multi-injection method.
Multi-resonant scatterers in sonic crystals: Locally multi-resonant acoustic metamaterial
NASA Astrophysics Data System (ADS)
Romero-García, V.; Krynkin, A.; Garcia-Raffi, L. M.; Umnova, O.; Sánchez-Pérez, J. V.
2013-01-01
An acoustic metamaterial made of a two-dimensional (2D) periodic array of multi-resonant acoustic scatterers is analyzed both experimentally and theoretically. The building blocks consist of a combination of elastic beams of low-density polyethylene foam (LDPF) with cavities of known area. Elastic resonances of the beams and acoustic resonances of the cavities can be excited by sound producing several attenuation peaks in the low frequency range. Due to this behavior the periodic array with long wavelength multi-resonant structural units can be classified as a locally multi-resonant acoustic metamaterial (LMRAM) with strong dispersion of its effective properties.The results presented in this paper could be used to design effective tunable acoustic filters for the low frequency range.
Preconditioned Alternating Projection Algorithms for Maximum a Posteriori ECT Reconstruction
Krol, Andrzej; Li, Si; Shen, Lixin; Xu, Yuesheng
2012-01-01
We propose a preconditioned alternating projection algorithm (PAPA) for solving the maximum a posteriori (MAP) emission computed tomography (ECT) reconstruction problem. Specifically, we formulate the reconstruction problem as a constrained convex optimization problem with the total variation (TV) regularization. We then characterize the solution of the constrained convex optimization problem and show that it satisfies a system of fixed-point equations defined in terms of two proximity operators raised from the convex functions that define the TV-norm and the constrain involved in the problem. The characterization (of the solution) via the proximity operators that define two projection operators naturally leads to an alternating projection algorithm for finding the solution. For efficient numerical computation, we introduce to the alternating projection algorithm a preconditioning matrix (the EM-preconditioner) for the dense system matrix involved in the optimization problem. We prove theoretically convergence of the preconditioned alternating projection algorithm. In numerical experiments, performance of our algorithms, with an appropriately selected preconditioning matrix, is compared with performance of the conventional MAP expectation-maximization (MAP-EM) algorithm with TV regularizer (EM-TV) and that of the recently developed nested EM-TV algorithm for ECT reconstruction. Based on the numerical experiments performed in this work, we observe that the alternating projection algorithm with the EM-preconditioner outperforms significantly the EM-TV in all aspects including the convergence speed, the noise in the reconstructed images and the image quality. It also outperforms the nested EM-TV in the convergence speed while providing comparable image quality. PMID:23271835
Shadid, J. N.; Pawlowski, R. P.; Cyr, E. C.; ...
2016-02-10
Here, we discuss that the computational solution of the governing balance equations for mass, momentum, heat transfer and magnetic induction for resistive magnetohydrodynamics (MHD) systems can be extremely challenging. These difficulties arise from both the strong nonlinear, nonsymmetric coupling of fluid and electromagnetic phenomena, as well as the significant range of time- and length-scales that the interactions of these physical mechanisms produce. This paper explores the development of a scalable, fully-implicit stabilized unstructured finite element (FE) capability for 3D incompressible resistive MHD. The discussion considers the development of a stabilized FE formulation in context of the variational multiscale (VMS) method,more » and describes the scalable implicit time integration and direct-to-steady-state solution capability. The nonlinear solver strategy employs Newton–Krylov methods, which are preconditioned using fully-coupled algebraic multilevel preconditioners. These preconditioners are shown to enable a robust, scalable and efficient solution approach for the large-scale sparse linear systems generated by the Newton linearization. Verification results demonstrate the expected order-of-accuracy for the stabilized FE discretization. The approach is tested on a variety of prototype problems, that include MHD duct flows, an unstable hydromagnetic Kelvin–Helmholtz shear layer, and a 3D island coalescence problem used to model magnetic reconnection. Initial results that explore the scaling of the solution methods are also presented on up to 128K processors for problems with up to 1.8B unknowns on a CrayXK7.« less
Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction
NASA Astrophysics Data System (ADS)
Krol, Andrzej; Li, Si; Shen, Lixin; Xu, Yuesheng
2012-11-01
We propose a preconditioned alternating projection algorithm (PAPA) for solving the maximum a posteriori (MAP) emission computed tomography (ECT) reconstruction problem. Specifically, we formulate the reconstruction problem as a constrained convex optimization problem with the total variation (TV) regularization. We then characterize the solution of the constrained convex optimization problem and show that it satisfies a system of fixed-point equations defined in terms of two proximity operators raised from the convex functions that define the TV-norm and the constraint involved in the problem. The characterization (of the solution) via the proximity operators that define two projection operators naturally leads to an alternating projection algorithm for finding the solution. For efficient numerical computation, we introduce to the alternating projection algorithm a preconditioning matrix (the EM-preconditioner) for the dense system matrix involved in the optimization problem. We prove theoretically convergence of the PAPA. In numerical experiments, performance of our algorithms, with an appropriately selected preconditioning matrix, is compared with performance of the conventional MAP expectation-maximization (MAP-EM) algorithm with TV regularizer (EM-TV) and that of the recently developed nested EM-TV algorithm for ECT reconstruction. Based on the numerical experiments performed in this work, we observe that the alternating projection algorithm with the EM-preconditioner outperforms significantly the EM-TV in all aspects including the convergence speed, the noise in the reconstructed images and the image quality. It also outperforms the nested EM-TV in the convergence speed while providing comparable image quality.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feng, Wenqiang, E-mail: wfeng1@vols.utk.edu; Salgado, Abner J., E-mail: asalgad1@utk.edu; Wang, Cheng, E-mail: cwang1@umassd.edu
We describe and analyze preconditioned steepest descent (PSD) solvers for fourth and sixth-order nonlinear elliptic equations that include p-Laplacian terms on periodic domains in 2 and 3 dimensions. The highest and lowest order terms of the equations are constant-coefficient, positive linear operators, which suggests a natural preconditioning strategy. Such nonlinear elliptic equations often arise from time discretization of parabolic equations that model various biological and physical phenomena, in particular, liquid crystals, thin film epitaxial growth and phase transformations. The analyses of the schemes involve the characterization of the strictly convex energies associated with the equations. We first give a generalmore » framework for PSD in Hilbert spaces. Based on certain reasonable assumptions of the linear pre-conditioner, a geometric convergence rate is shown for the nonlinear PSD iteration. We then apply the general theory to the fourth and sixth-order problems of interest, making use of Sobolev embedding and regularity results to confirm the appropriateness of our pre-conditioners for the regularized p-Lapacian problems. Our results include a sharper theoretical convergence result for p-Laplacian systems compared to what may be found in existing works. We demonstrate rigorously how to apply the theory in the finite dimensional setting using finite difference discretization methods. Numerical simulations for some important physical application problems – including thin film epitaxy with slope selection and the square phase field crystal model – are carried out to verify the efficiency of the scheme.« less
Parallel computing techniques for rotorcraft aerodynamics
NASA Astrophysics Data System (ADS)
Ekici, Kivanc
The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
A Tensor-Train accelerated solver for integral equations in complex geometries
NASA Astrophysics Data System (ADS)
Corona, Eduardo; Rahimian, Abtin; Zorin, Denis
2017-04-01
We present a framework using the Quantized Tensor Train (QTT) decomposition to accurately and efficiently solve volume and boundary integral equations in three dimensions. We describe how the QTT decomposition can be used as a hierarchical compression and inversion scheme for matrices arising from the discretization of integral equations. For a broad range of problems, computational and storage costs of the inversion scheme are extremely modest O (log N) and once the inverse is computed, it can be applied in O (Nlog N) . We analyze the QTT ranks for hierarchically low rank matrices and discuss its relationship to commonly used hierarchical compression techniques such as FMM and HSS. We prove that the QTT ranks are bounded for translation-invariant systems and argue that this behavior extends to non-translation invariant volume and boundary integrals. For volume integrals, the QTT decomposition provides an efficient direct solver requiring significantly less memory compared to other fast direct solvers. We present results demonstrating the remarkable performance of the QTT-based solver when applied to both translation and non-translation invariant volume integrals in 3D. For boundary integral equations, we demonstrate that using a QTT decomposition to construct preconditioners for a Krylov subspace method leads to an efficient and robust solver with a small memory footprint. We test the QTT preconditioners in the iterative solution of an exterior elliptic boundary value problem (Laplace) formulated as a boundary integral equation in complex, multiply connected geometries.
NASA Astrophysics Data System (ADS)
Feng, Wenqiang; Salgado, Abner J.; Wang, Cheng; Wise, Steven M.
2017-04-01
We describe and analyze preconditioned steepest descent (PSD) solvers for fourth and sixth-order nonlinear elliptic equations that include p-Laplacian terms on periodic domains in 2 and 3 dimensions. The highest and lowest order terms of the equations are constant-coefficient, positive linear operators, which suggests a natural preconditioning strategy. Such nonlinear elliptic equations often arise from time discretization of parabolic equations that model various biological and physical phenomena, in particular, liquid crystals, thin film epitaxial growth and phase transformations. The analyses of the schemes involve the characterization of the strictly convex energies associated with the equations. We first give a general framework for PSD in Hilbert spaces. Based on certain reasonable assumptions of the linear pre-conditioner, a geometric convergence rate is shown for the nonlinear PSD iteration. We then apply the general theory to the fourth and sixth-order problems of interest, making use of Sobolev embedding and regularity results to confirm the appropriateness of our pre-conditioners for the regularized p-Lapacian problems. Our results include a sharper theoretical convergence result for p-Laplacian systems compared to what may be found in existing works. We demonstrate rigorously how to apply the theory in the finite dimensional setting using finite difference discretization methods. Numerical simulations for some important physical application problems - including thin film epitaxy with slope selection and the square phase field crystal model - are carried out to verify the efficiency of the scheme.
Self-recovery reversible image watermarking algorithm
Sun, He; Gao, Shangbing; Jin, Shenghua
2018-01-01
The integrity of image content is essential, although most watermarking algorithms can achieve image authentication but not automatically repair damaged areas or restore the original image. In this paper, a self-recovery reversible image watermarking algorithm is proposed to recover the tampered areas effectively. First of all, the original image is divided into homogeneous blocks and non-homogeneous blocks through multi-scale decomposition, and the feature information of each block is calculated as the recovery watermark. Then, the original image is divided into 4×4 non-overlapping blocks classified into smooth blocks and texture blocks according to image textures. Finally, the recovery watermark generated by homogeneous blocks and error-correcting codes is embedded into the corresponding smooth block by mapping; watermark information generated by non-homogeneous blocks and error-correcting codes is embedded into the corresponding non-embedded smooth block and the texture block via mapping. The correlation attack is detected by invariant moments when the watermarked image is attacked. To determine whether a sub-block has been tampered with, its feature is calculated and the recovery watermark is extracted from the corresponding block. If the image has been tampered with, it can be recovered. The experimental results show that the proposed algorithm can effectively recover the tampered areas with high accuracy and high quality. The algorithm is characterized by sound visual quality and excellent image restoration. PMID:29920528
Measuring health systems strength and its impact: experiences from the African Health Initiative.
Sherr, Kenneth; Fernandes, Quinhas; Kanté, Almamy M; Bawah, Ayaga; Condo, Jeanine; Mutale, Wilbroad
2017-12-21
Health systems are essential platforms for accessible, quality health services, and population health improvements. Global health initiatives have dramatically increased health resources; however, funding to strengthen health systems has not increased commensurately, partially due to concerns about health system complexity and evidence gaps demonstrating health outcome improvements. In 2009, the African Health Initiative of the Doris Duke Charitable Foundation began supporting Population Health Implementation and Training Partnership projects in five sub-Saharan African countries (Ghana, Mozambique, Rwanda, Tanzania, and Zambia) to catalyze significant advances in strengthening health systems. This manuscript reflects on the experience of establishing an evaluation framework to measure health systems strength, and associate measures with health outcomes, as part of this Initiative. Using the World Health Organization's health systems building block framework, the Partnerships present novel approaches to measure health systems building blocks and summarize data across and within building blocks to facilitate analytic procedures. Three Partnerships developed summary measures spanning the building blocks using principal component analysis (Ghana and Tanzania) or the balanced scorecard (Zambia). Other Partnerships developed summary measures to simplify multiple indicators within individual building blocks, including health information systems (Mozambique), and service delivery (Rwanda). At the end of the project intervention period, one to two key informants from each Partnership's leadership team were asked to list - in rank order - the importance of the six building blocks in relation to their intervention. Though there were differences across Partnerships, service delivery and information systems were reported to be the most common focus of interventions, followed by health workforce and leadership and governance. Medical products, vaccines and technologies, and health financing, were the building blocks reported to be of lower focus. The African Health Initiative experience furthers the science of evaluation for health systems strengthening, highlighting areas for further methodological development - including the development of valid, feasible measures sensitive to interventions in multiple contexts (particularly in leadership and governance) and describing interactions across building blocks; in developing summary statistics to facilitate testing intervention effects on health systems and associations with health status; and designing appropriate analytic models for complex, multi-level open health systems.
Composite sandwich structure and method for making same
NASA Technical Reports Server (NTRS)
Magurany, Charles J. (Inventor)
1995-01-01
A core for a sandwich structure which has multi-ply laminate ribs separated by voids is made as an integral unit in one single curing step. Tooling blocks corresponding to the voids are first wrapped by strips of prepreg layup equal to one half of each rib laminate so a continuous wall of prepreg material is formed around the tooling blocks. The wrapped tooling blocks are next pressed together laterally, like tiles, so adjoining walls from two tooling blocks are joined. The assembly is then cured by conventional methods, and afterwards the tooling blocks are removed so voids are formed. The ribs can be provided with integral tabs forming bonding areas for face sheets, and face sheets may be co-cured with the core ribs. The new core design is suitable for discrete ribcores used in space telescopes and reflector panels, where quasiisotropic properties and zero coefficient of thermal expansion are required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doherty, Kimberly R., E-mail: kimberly.doherty@quintiles.com; Wappel, Robert L.; Talbert, Dominique R.
2013-10-01
Tyrosine kinase inhibitors (TKi) have greatly improved the treatment and prognosis of multiple cancer types. However, unexpected cardiotoxicity has arisen in a subset of patients treated with these agents that was not wholly predicted by pre-clinical testing, which centers around animal toxicity studies and inhibition of the human Ether-à-go-go-Related Gene (hERG) channel. Therefore, we sought to determine whether a multi-parameter test panel assessing the effect of drug treatment on cellular, molecular, and electrophysiological endpoints could accurately predict cardiotoxicity. We examined how 4 FDA-approved TKi agents impacted cell viability, apoptosis, reactive oxygen species (ROS) generation, metabolic status, impedance, and ion channelmore » function in human cardiomyocytes. The 3 drugs clinically associated with severe cardiac adverse events (crizotinib, sunitinib, nilotinib) all proved to be cardiotoxic in our in vitro tests while the relatively cardiac-safe drug erlotinib showed only minor changes in cardiac cell health. Crizotinib, an ALK/MET inhibitor, led to increased ROS production, caspase activation, cholesterol accumulation, disruption in cardiac cell beat rate, and blockage of ion channels. The multi-targeted TKi sunitinib showed decreased cardiomyocyte viability, AMPK inhibition, increased lipid accumulation, disrupted beat pattern, and hERG block. Nilotinib, a second generation Bcr-Abl inhibitor, led to increased ROS generation, caspase activation, hERG block, and an arrhythmic beat pattern. Thus, each drug showed a unique toxicity profile that may reflect the multiple mechanisms leading to cardiotoxicity. This study demonstrates that a multi-parameter approach can provide a robust characterization of drug-induced cardiomyocyte damage that can be leveraged to improve drug safety during early phase development. - Highlights: • TKi with known adverse effects show unique cardiotoxicity profiles in this panel. • Crizotinib increases ROS, apoptosis, and cholesterol as well as alters beat rate. • Sunitinib inhibits AMPK, increases lipids and alters the cardiac beat pattern. • Nilotinib causes ROS and caspase activation, decreased lipids and arrhythmia. • Erlotinib did not impact ROS, caspase, or lipid levels or affect the beat pattern.« less
Andersen, Marie Louise Max; Rasmussen, Morten Arendt; Pörksen, Sven; Svensson, Jannet; Vikre-Jørgensen, Jennifer; Thomsen, Jane; Hertel, Niels Thomas; Johannesen, Jesper; Pociot, Flemming; Petersen, Jacob Sten; Hansen, Lars; Mortensen, Henrik Bindesbøl; Nielsen, Lotte Brøndum
2013-01-01
The purpose of the present study is to explore the progression of type 1 diabetes (T1D) in Danish children 12 months after diagnosis using Latent Factor Modelling. We include three data blocks of dynamic paraclinical biomarkers, baseline clinical characteristics and genetic profiles of diabetes related SNPs in the analyses. This method identified a model explaining 21.6% of the total variation in the data set. The model consists of two components: (1) A pattern of declining residual β-cell function positively associated with young age, presence of diabetic ketoacidosis and long duration of disease symptoms (P = 0.0004), and with risk alleles of WFS1, CDKN2A/2B and RNLS (P = 0.006). (2) A second pattern of high ZnT8 autoantibody levels and low postprandial glucagon levels associated with risk alleles of IFIH1, TCF2, TAF5L, IL2RA and PTPN2 and protective alleles of ERBB3 gene (P = 0.0005). These results demonstrate that Latent Factor Modelling can identify associating patterns in clinical prospective data – future functional studies will be needed to clarify the relevance of these patterns. PMID:23755131
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Yousef
2014-01-16
The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less
Multi-linear sparse reconstruction for SAR imaging based on higher-order SVD
NASA Astrophysics Data System (ADS)
Gao, Yu-Fei; Gui, Guan; Cong, Xun-Chao; Yang, Yue; Zou, Yan-Bin; Wan, Qun
2017-12-01
This paper focuses on the spotlight synthetic aperture radar (SAR) imaging for point scattering targets based on tensor modeling. In a real-world scenario, scatterers usually distribute in the block sparse pattern. Such a distribution feature has been scarcely utilized by the previous studies of SAR imaging. Our work takes advantage of this structure property of the target scene, constructing a multi-linear sparse reconstruction algorithm for SAR imaging. The multi-linear block sparsity is introduced into higher-order singular value decomposition (SVD) with a dictionary constructing procedure by this research. The simulation experiments for ideal point targets show the robustness of the proposed algorithm to the noise and sidelobe disturbance which always influence the imaging quality of the conventional methods. The computational resources requirement is further investigated in this paper. As a consequence of the algorithm complexity analysis, the present method possesses the superiority on resource consumption compared with the classic matching pursuit method. The imaging implementations for practical measured data also demonstrate the effectiveness of the algorithm developed in this paper.
NASA Astrophysics Data System (ADS)
Azimi, S.; Delavar, M. R.; Rajabifard, A.
2017-09-01
In response to natural disasters, efficient planning for optimum allocation of the medical assistance to wounded as fast as possible and wayfinding of first responders immediately to minimize the risk of natural disasters are of prime importance. This paper aims to propose a multi-agent based modeling for optimum allocation of space to emergency centers according to the population, street network and number of ambulances in emergency centers by constraint network Voronoi diagrams, wayfinding of ambulances from emergency centers to the wounded locations and return based on the minimum ambulances travel time and path length implemented by NSGA and the use of smart city facilities to accelerate the rescue operation. Simulated annealing algorithm has been used for minimizing the difference between demands and supplies of the constrained network Voronoi diagrams. In the proposed multi-agent system, after delivering the location of the wounded and their symptoms, the constraint network Voronoi diagram for each emergency center is determined. This process was performed simultaneously for the multi-injuries in different Voronoi diagrams. In the proposed multi-agent system, the priority of the injuries for receiving medical assistance and facilities of the smart city for reporting the blocked streets was considered. Tehran Municipality District 5 was considered as the study area and during 3 minutes intervals, the volunteers reported the blocked street. The difference between the supply and the demand divided to the supply in each Voronoi diagram decreased to 0.1601. In the proposed multi-agent system, the response time of the ambulances is decreased about 36.7%.
NASA Astrophysics Data System (ADS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Toolan, Daniel T W; Adlington, Kevin; Isakova, Anna; Kalamiotis, Alexis; Mokarian-Tabari, Parvaneh; Dimitrakis, Georgios; Dodds, Christopher; Arnold, Thomas; Terrill, Nick J; Bras, Wim; Hermida Merino, Daniel; Topham, Paul D; Irvine, Derek J; Howse, Jonathan R
2017-08-09
Microwave annealing has emerged as an alternative to traditional thermal annealing approaches for optimising block copolymer self-assembly. A novel sample environment enabling small angle X-ray scattering to be performed in situ during microwave annealing is demonstrated, which has enabled, for the first time, the direct study of the effects of microwave annealing upon the self-assembly behavior of a model, commercial triblock copolymer system [polystyrene-block-poly(ethylene-co-butylene)-block-polystyrene]. Results show that the block copolymer is a poor microwave absorber, resulting in no change in the block copolymer morphology upon application of microwave energy. The block copolymer species may only indirectly interact with the microwave energy when a small molecule microwave-interactive species [diethylene glycol dibenzoate (DEGDB)] is incorporated directly into the polymer matrix. Then significant morphological development is observed at DEGDB loadings ≥6 wt%. Through spatial localisation of the microwave-interactive species, we demonstrate targeted annealing of specific regions of a multi-component system, opening routes for the development of "smart" manufacturing methodologies.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks
NASA Technical Reports Server (NTRS)
Lee, Y. H.; Shin, K. G.
1982-01-01
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
Numerical grid generation in computational field simulations. Volume 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soni, B.K.; Thompson, J.F.; Haeuser, J.
1996-12-31
To enhance the CFS technology to its next level of applicability (i.e., to create acceptance of CFS in an integrated product and process development involving multidisciplinary optimization) the basic requirements are: rapid turn-around time, reliable and accurate simulation, affordability and appropriate linkage to other engineering disciplines. In response to this demand, there has been a considerable growth in the grid generation related research activities involving automization, parallel processing, linkage with the CAD-CAM systems, CFS with dynamic motion and moving boundaries, strategies and algorithms associated with multi-block structured, unstructured, hybrid, hexahedral, and Cartesian grids, along with its applicability to various disciplinesmore » including biomedical, semiconductor, geophysical, ocean modeling, and multidisciplinary optimization.« less
Synthesis and Examination of New Catalytic Polymers.
1985-12-01
Army position, policy, or decision, unless so r-c gg Pd hv othpr ictrnnt.t , *9. KEY WORDS (Continu, on re.erse side If necessary and Identify by block...Enzymatiz reactivity; Reactivity, theory of; Multi-armed systems. 120. ABSTRACT ce at n ,ue Z v-,erse sli f rear’ dz Identify by block number...proximity is achieved in the cyclo- dextrin system by means of a more sophisticated and costly synthesis. This is an important consideration for use
Thermal Assessment of Swift BAT Instrument Thermal Control System in Flight
NASA Technical Reports Server (NTRS)
Choi, Michael K.
2005-01-01
THE BAT is the primary instrument on the Swift spacecraft. The Swift mission is part of the National Aeronautics and Space Administration (NASA) Medium-Size Explorer (MIDEX) Program, and is managed by Goddard Space Flight Center (GSFC). It is designed to detect gamma ray burst over a broad region of the sky in a low Earth orbit of 600-km altitude and quickly align the telescopes on the spacecraft to the gamma ray source. It was successfully launched into orbit on November 20, 2004. The Swift mission is a first of its kind of multi-wavelength transient observatory for gamma ray burst astronomy. Its mission life is 2 years. The inclination is 22 deg maximum. The spacecraft bus voltage to the instruments is in the 24 V to 35 V range. The instruments will be turned off when the voltage is below 27 V. The BAT is mounted to the optical bench through five titanium flexures. The BAT has been developed at GSFC. Its telescope assembly consists of 256 Detector Modules (DMs) in the Detector Array. There are 16 Detector Array Blocks. Each Block holds 16 DMs, 3 Block Voltage Regulator (BVR) units and 3 Block Command & Data Handling (BCDH) units. The power dissipation of each Block has been measured to be 13 W. Therefore the total power dissipation of the 16 Blocks is 208 W. The DAP is 1.3 m (4.3 ft) x 1 m (3.3 ft), accommodates all the 16 Blocks. It also provides the mounting surface and the positional stability for the Blocks. The DMs are located at the top (+X side) of the DAP and is enclosed by graded-Z shields on the sides and a coded mask at the top. The BVRs and BCDHs are located at the bottom (-X side) of the DAP. Eight Blocks are located at the front (-Z side or radiator side) of the DAP, and eight are located at the rear (+Z side) of the DAP. The DMs and top of DAP are insulated with a 7-layer multi-layer insulation (MLI). There is a 5.08 cm (2 in) x 5.08 cm (2 in) MLI cutout over each Block heater controller so that heat radiates from the heater controller to the mask. The exterior of the mask, graded-Z shields and bottom of DAP is insulated with a 15-layer MLI.
NASA Technical Reports Server (NTRS)
Liu, Kuojuey Ray
1990-01-01
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Tomcin, Stephanie; Kelsch, Annette; Staff, Roland H; Landfester, Katharina; Zentel, Rudolf; Mailänder, Volker
2016-04-15
We describe a method how polymeric nanoparticles stabilized with (2-hydroxypropyl)methacrylamide (HPMA)-based block copolymers are used as drug delivery systems for a fast release of hydrophobic and a controlled release of an amphiphilic molecule. The versatile method of the miniemulsion solvent-evaporation technique was used to prepare polystyrene (PS) as well as poly-d/l-lactide (PDLLA) nanoparticles. Covalently bound or physically adsorbed fluorescent dyes labeled the particles' core and their block copolymer corona. Confocal laser scanning microscopy (CLSM) in combination with flow cytometry measurements were applied to demonstrate the burst release of a fluorescent hydrophobic drug model without the necessity of nanoparticle uptake. In addition, CLSM studies and quantitative calculations using the image processing program Volocity® show the intracellular detachment of the amphiphilic block copolymer from the particles' core after uptake. Our findings offer the possibility to combine the advantages of a fast release for hydrophobic and a controlled release for an amphiphilic molecule therefore pointing to the possibility to a 'multi-step and multi-site' targeting by one nanocarrier. We describe thoroughly how different components of a nanocarrier end up in cells. This enables different cargos of a nanocarrier having a consecutive release and delivery of distinct components. Most interestingly we demonstrate individual kinetics of distinct components of such a system: first the release of a fluorescent hydrophobic drug model at contact with the cell membrane without the necessity of nanoparticle uptake. Secondly, the intracellular detachment of the amphiphilic block copolymer from the particles' core after uptake occurs. This offers the possibility to combine the advantages of a fast release for a hydrophobic substance at the time of interaction of the nanoparticle with the cell surface and a controlled release for an amphiphilic molecule later on therefore pointing to the possibility to a 'multi-step and multisite' targeting by one nanocarrier. We therefore feel that this could be used for many cellular systems where the combined and orchestrated delivery of components is prerequisite in order to obtain the highest efficiency. Copyright © 2016 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Adaptive Numerical Algorithms in Space Weather Modeling
NASA Technical Reports Server (NTRS)
Toth, Gabor; vanderHolst, Bart; Sokolov, Igor V.; DeZeeuw, Darren; Gombosi, Tamas I.; Fang, Fang; Manchester, Ward B.; Meng, Xing; Nakib, Dalal; Powell, Kenneth G.;
2010-01-01
Space weather describes the various processes in the Sun-Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different physics in different domains. A multi-physics system can be modeled by a software framework comprising of several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solar wind Roe Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamics (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit numerical schemes. Depending on the application, we find that different time stepping methods are optimal. Several of the time integration schemes exploit the block-based granularity of the grid structure. The framework and the adaptive algorithms enable physics based space weather modeling and even forecasting.
NASA Astrophysics Data System (ADS)
Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.
2017-12-01
StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Rixen, Daniel
1996-01-01
We present an optimal preconditioning algorithm that is equally applicable to the dual (FETI) and primal (Balancing) Schur complement domain decomposition methods, and which successfully addresses the problems of subdomain heterogeneities including the effects of large jumps of coefficients. The proposed preconditioner is derived from energy principles and embeds a new coarsening operator that propagates the error globally and accelerates convergence. The resulting iterative solver is illustrated with the solution of highly heterogeneous elasticity problems.
CFD Methods and Tools for Multi-Element Airfoil Analysis
NASA Technical Reports Server (NTRS)
Rogers, Stuart E.; George, Michael W. (Technical Monitor)
1995-01-01
This lecture will discuss the computational tools currently available for high-lift multi-element airfoil analysis. It will present an overview of a number of different numerical approaches, their current capabilities, short-comings, and computational costs. The lecture will be limited to viscous methods, including inviscid/boundary layer coupling methods, and incompressible and compressible Reynolds-averaged Navier-Stokes methods. Both structured and unstructured grid generation approaches will be presented. Two different structured grid procedures are outlined, one which uses multi-block patched grids, the other uses overset chimera grids. Turbulence and transition modeling will be discussed.
Evaluation of non-selective refocusing pulses for 7 T MRI
Moore, Jay; Jankiewicz, Marcin; Anderson, Adam W.; Gore, John C.
2011-01-01
There is a continuing need for improved RF pulses that achieve proper refocusing in the context of ultra-high field (≥ 7 T) human MRI. Simple block or sinc pulses are highly susceptible to RF field inhomogeneities, and adiabatic pulses are generally considered too SAR intensive for practical use at 7 T. The performance of the array of pulses falling between these extremes, however, has not been systematically evaluated. The aim of this work was to compare the performances of 21 non-selective refocusing pulses spanning a range of durations and SAR levels. The evaluation was based upon simulations and both phantom and in vivo human brain experiments conducted at 7 T. Tested refocusing designs included block, composite block, BIR-4, hyperbolic secant, and numerically optimized composite waveforms. These pulses were divided into three SAR classes and two duration categories, and, based on signal gain in a 3-D spin echo sequence, practical recommendations on usage are made within each category. All evaluated pulses were found to produce greater volume-averaged signals relative to a 180° block pulse. Although signal gains often come with the price of increased SAR or duration, some pulses were found to result in significant signal enhancement while also adhering to practical constraints. This work demonstrates the signal gains and losses realizable with single-channel refocusing pulse designs and should assist in the selection of suitable refocusing pulses for practical 3-D spin-echo imaging at 7 T. It further establishes a reference against which future pulses and multi-channel designs can be compared. PMID:22177384
Novel Self-Assembling Amino Acid-Derived Block Copolymer with Changeable Polymer Backbone Structure.
Koga, Tomoyuki; Aso, Eri; Higashi, Nobuyuki
2016-11-29
Block copolymers have attracted much attention as potentially interesting building blocks for the development of novel nanostructured materials in recent years. Herein, we report a new type of self-assembling block copolymer with changeable polymer backbone structure, poly(Fmoc-Ser) ester -b-PSt, which was synthesized by combining the polycondensation of 9-fluorenylmethoxycarbonyl-serine (Fmoc-Ser) with the reversible addition-fragmentation chain transfer (RAFT) polymerization of styrene (St). This block copolymer showed the direct conversion of the backbone structure from polyester to polypeptide through a multi O,N-acyl migration triggered by base-induced deprotection of Fmoc groups in organic solvent. Such polymer-to-polymer conversion was found to occur quantitatively without decrease in degree of polymerization and to cause a drastic change in self-assembling property of the block copolymer. On the basis of several morphological analyses using FTIR spectroscopy, atomic force, and transmission and scanning electron microscopies, the resulting peptide block copolymer was found to self-assemble into a vesicle-like hollow nanosphere with relatively uniform diameter of ca. 300 nm in toluene. In this case, the peptide block generated from polyester formed β-sheet structure, indicating the self-assembly via peptide-guided route. We believe the findings presented in this study offer a new concept for the development of self-assembling block copolymer system.
Intelligent Agent Transparency in Human-Agent Teaming for Multi-UxV Management.
Mercado, Joseph E; Rupp, Michael A; Chen, Jessie Y C; Barnes, Michael J; Barber, Daniel; Procci, Katelyn
2016-05-01
We investigated the effects of level of agent transparency on operator performance, trust, and workload in a context of human-agent teaming for multirobot management. Participants played the role of a heterogeneous unmanned vehicle (UxV) operator and were instructed to complete various missions by giving orders to UxVs through a computer interface. An intelligent agent (IA) assisted the participant by recommending two plans-a top recommendation and a secondary recommendation-for every mission. A within-subjects design with three levels of agent transparency was employed in the present experiment. There were eight missions in each of three experimental blocks, grouped by level of transparency. During each experimental block, the IA was incorrect three out of eight times due to external information (e.g., commander's intent and intelligence). Operator performance, trust, workload, and usability data were collected. Results indicate that operator performance, trust, and perceived usability increased as a function of transparency level. Subjective and objective workload data indicate that participants' workload did not increase as a function of transparency. Furthermore, response time did not increase as a function of transparency. Unlike previous research, which showed that increased transparency resulted in increased performance and trust calibration at the cost of greater workload and longer response time, our results support the benefits of transparency for performance effectiveness without additional costs. The current results will facilitate the implementation of IAs in military settings and will provide useful data to the design of heterogeneous UxV teams. © 2016, Human Factors and Ergonomics Society.
Amino Acid Block Copolymers with Broad Antimicrobial Activity and Barrier Properties.
Bevilacqua, Michael P; Huang, Daniel J; Wall, Brian D; Lane, Shalyn J; Edwards, Carl K; Hanson, Jarrod A; Benitez, Diego; Solomkin, Joseph S; Deming, Timothy J
2017-10-01
Antimicrobial properties of a long-chain, synthetic, cationic, and hydrophobic amino acid block copolymer are reported. In 5 and 60 min time-kill assays, solutions of K 100 L 40 block copolymers (poly(l-lysine·hydrochloride) 100 -b-poly(l-leucine) 40 ) at concentrations of 10-100 µg mL -1 show multi-log reductions in colony forming units of Gram-positive and Gram-negative bacteria, as well as yeast, including multidrug-resistant strains. Driven by association of hydrophobic segments, K 100 L 40 copolymers form viscous solutions and self-supporting hydrogels in water at concentrations of 1 and 2 wt%, respectively. These K 100 L 40 preparations provide an effective barrier to microbial contamination of wounds, as measured by multi-log decreases of tissue-associated bacteria with deliberate inoculation of porcine skin explants, porcine open wounds, and rodent closed wounds with foreign body. Based on these findings, amino acid copolymers with the features of K 100 L 40 can combine potent, direct antimicrobial activity and barrier properties in one biopolymer for a new approach to prevention of wound infections. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mesozoic Deformation and Its Geological Significance in the Southern Margin of the South China Sea
NASA Astrophysics Data System (ADS)
Zhu, Rongwei; Liu, Hailing; Yao, Yongjian; Wang, Yin
2018-05-01
The pre-Eocene history of the region around the present South China Sea is not well known. New multi-channel seismic profiles provide valuable insights into the probable Mesozoic history of this region. Detailed structural and stratigraphic interpretations of the multi-channel seismic profiles, calibrated with relevant drilling and dredging data, show major Mesozoic structural features. A structural restoration was done to remove the Cenozoic tectonic influence and calculate the Mesozoic tectonic compression ratios. The results indicate that two groups of compressive stress with diametrically opposite orientations, S(S)E-N(N)W and N(N)W-S(S)E, were active during the Mesozoic. The compression ratio values gradually decrease from north to south and from west to east in each stress orientation. The phenomena may be related to the opening of the proto-South China Sea (then located in south of the Nansha block) and the rate at which the Nansha block drifted northward in the late Jurassic to late Cretaceous. The Nansha block drifted northward until it collided and sutured with the southern China margin. The opening of the present South China Sea may be related to this suture zone, which was a tectonic zone of weakness.
A Multi-Objective Optimization Technique to Model the Pareto Front of Organic Dielectric Polymers
NASA Astrophysics Data System (ADS)
Gubernatis, J. E.; Mannodi-Kanakkithodi, A.; Ramprasad, R.; Pilania, G.; Lookman, T.
Multi-objective optimization is an area of decision making that is concerned with mathematical optimization problems involving more than one objective simultaneously. Here we describe two new Monte Carlo methods for this type of optimization in the context of their application to the problem of designing polymers with more desirable dielectric and optical properties. We present results of applying these Monte Carlo methods to a two-objective problem (maximizing the total static band dielectric constant and energy gap) and a three objective problem (maximizing the ionic and electronic contributions to the static band dielectric constant and energy gap) of a 6-block organic polymer. Our objective functions were constructed from high throughput DFT calculations of 4-block polymers, following the method of Sharma et al., Nature Communications 5, 4845 (2014) and Mannodi-Kanakkithodi et al., Scientific Reports, submitted. Our high throughput and Monte Carlo methods of analysis extend to general N-block organic polymers. This work was supported in part by the LDRD DR program of the Los Alamos National Laboratory and in part by a Multidisciplinary University Research Initiative (MURI) Grant from the Office of Naval Research.
A privacy-preserving parallel and homomorphic encryption scheme
NASA Astrophysics Data System (ADS)
Min, Zhaoe; Yang, Geng; Shi, Jingqi
2017-04-01
In order to protect data privacy whilst allowing efficient access to data in multi-nodes cloud environments, a parallel homomorphic encryption (PHE) scheme is proposed based on the additive homomorphism of the Paillier encryption algorithm. In this paper we propose a PHE algorithm, in which plaintext is divided into several blocks and blocks are encrypted with a parallel mode. Experiment results demonstrate that the encryption algorithm can reach a speed-up ratio at about 7.1 in the MapReduce environment with 16 cores and 4 nodes.
ERIC Educational Resources Information Center
Saktanli, S. Cem
2011-01-01
This experimental study was done to see if using computer supported notation and vocalization program for teaching songs instead of using block flute accompanied song teaching has any significant effect on students' singing behavior. The study group is composed of the 5th, 6th and 7th graders of 2008-2009 educational term in T.O.K.I. Yahya Kemal…
NASA Technical Reports Server (NTRS)
Mineck, Raymond E.; Thomas, James L.; Biedron, Robert T.; Diskin, Boris
2005-01-01
FMG3D (full multigrid 3 dimensions) is a pilot computer program that solves equations of fluid flow using a finite difference representation on a structured grid. Infrastructure exists for three dimensions but the current implementation treats only two dimensions. Written in Fortran 90, FMG3D takes advantage of the recursive subroutine feature, dynamic memory allocation, and structured-programming constructs of that language. FMG3D supports multi-block grids with three types of block-to-block interfaces: periodic, C-zero, and C-infinity. For all three types, grid points must match at interfaces. For periodic and C-infinity types, derivatives of grid metrics must be continuous at interfaces. The available equation sets are as follows: scalar elliptic equations, scalar convection equations, and the pressure-Poisson formulation of the Navier-Stokes equations for an incompressible fluid. All the equation sets are implemented with nonzero forcing functions to enable the use of user-specified solutions to assist in verification and validation. The equations are solved with a full multigrid scheme using a full approximation scheme to converge the solution on each succeeding grid level. Restriction to the next coarser mesh uses direct injection for variables and full weighting for residual quantities; prolongation of the coarse grid correction from the coarse mesh to the fine mesh uses bilinear interpolation; and prolongation of the coarse grid solution uses bicubic interpolation.
Magnet design for a low-emittance storage ring
Johansson, Martin; Anderberg, Bengt; Lindgren, Lars-Johan
2014-01-01
The MAX IV 3 GeV storage ring, currently under construction, pursues the goal of low electron beam emittance by using a multi-bend achromat magnet lattice, which is realised by having several consecutive magnet elements precision-machined out of a common solid iron block, 2.3–3.4 m long. With this magnet design solution, instead of having 1320 individual magnets, the MAX IV 3 GeV storage ring is built up using 140 integrated ‘magnet block’ units, containing all these magnet elements. Major features of this magnet block design are compactness, vibration stability and that the alignment of magnet elements within each unit is given by the mechanical accuracy of the CNC machining rather than individual field measurement and adjustment. This article presents practical engineering details of implementing this magnet design solution, and mechanical + magnetic field measurement results from the magnet production series. At the time of writing (spring 2014), the production series, which is totally outsourced to industry, is roughly half way through, with mechanical/magnetic QA conforming to specifications. It is the conclusion of the authors that the MAX IV magnet block concept, which has sometimes been described as new or innovative, is from a manufacturing point of view simply a collection of known mature production methods and measurement procedures, which can be executed at fixed cost with a low level of risk. PMID:25177980
Internal Passage Heat Transfer Prediction Using Multiblock Grids and a Kappa-Omega Turbulence Model
NASA Technical Reports Server (NTRS)
Rigby, David L.; Ameri, Ali A.; Steinthorsson, Erlendur
1996-01-01
Numerical simulations of the three-dimensional flow and heat transfer in a rectangular duct with a 180 C bend were performed. Results are presented for Reynolds numbers of 17,000 and 37,000 and for aspect ratios of 0.5 and I.O. A kappa-omega turbulence model with no reference to distance to a wall is used. Direct comparison between single block and multiblock grid calculations are made. Heat transfer and velocity distributions are compared to available literature with good agreement. The multi-block grid system is seen to produce more accurate results compared to a single-block grid with the same number of cells.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hathaway, M.D.; Wood, J.R.
1997-10-01
CFD codes capable of utilizing multi-block grids provide the capability to analyze the complete geometry of centrifugal compressors. Attendant with this increased capability is potentially increased grid setup time and more computational overhead with the resultant increase in wall clock time to obtain a solution. If the increase in difficulty of obtaining a solution significantly improves the solution from that obtained by modeling the features of the tip clearance flow or the typical bluntness of a centrifugal compressor`s trailing edge, then the additional burden is worthwhile. However, if the additional information obtained is of marginal use, then modeling of certainmore » features of the geometry may provide reasonable solutions for designers to make comparative choices when pursuing a new design. In this spirit a sequence of grids were generated to study the relative importance of modeling versus detailed gridding of the tip gap and blunt trailing edge regions of the NASA large low-speed centrifugal compressor for which there is considerable detailed internal laser anemometry data available for comparison. The results indicate: (1) There is no significant difference in predicted tip clearance mass flow rate whether the tip gap is gridded or modeled. (2) Gridding rather than modeling the trailing edge results in better predictions of some flow details downstream of the impeller, but otherwise appears to offer no great benefits. (3) The pitchwise variation of absolute flow angle decreases rapidly up to 8% impeller radius ratio and much more slowly thereafter. Although some improvements in prediction of flow field details are realized as a result of analyzing the actual geometry there is no clear consensus that any of the grids investigated produced superior results in every case when compared to the measurements. However, if a multi-block code is available, it should be used, as it has the propensity for enabling better predictions than a single block code.« less
NASA Astrophysics Data System (ADS)
Matsumoto, Naoya; Okazaki, Shigetoshi; Takamoto, Hisayoshi; Inoue, Takashi; Terakawa, Susumu
2014-02-01
We propose a method for high precision modulation of the pupil function of a microscope objective lens to improve the performance of multifocal multi-photon microscopy (MMM). To modulate the pupil function, we adopt a spatial light modulator (SLM) and place it at the conjugate position of the objective lens. The SLM can generate an arbitrary number of spots to excite the multiple fluorescence spots (MFS) at the desired positions and intensities by applying an appropriate computer-generated hologram (CGH). This flexibility allows us to control the MFS according to the photobleaching level of a fluorescent protein and phototoxicity of a specimen. However, when a large number of excitation spots are generated, the intensity distribution of the MFS is significantly different from the one originally designed due to misalignment of the optical setup and characteristics of the SLM. As a result, the image of a specimen obtained using laser scanning for the MFS has block noise segments because the SLM could not generate a uniform MFS. To improve the intensity distribution of the MFS, we adaptively redesigned the CGH based on the observed MFS. We experimentally demonstrate an improvement in the uniformity of a 10 × 10 MFS grid using a dye solution. The simplicity of the proposed method will allow it to be applied for calibration of MMM before observing living tissue. After the MMM calibration, we performed laser scanning with two-photon excitation to observe a real specimen without detecting block noise segments.
Synergistic High Charge-Storage Capacity for Multi-level Flexible Organic Flash Memory
NASA Astrophysics Data System (ADS)
Kang, Minji; Khim, Dongyoon; Park, Won-Tae; Kim, Jihong; Kim, Juhwan; Noh, Yong-Young; Baeg, Kang-Jun; Kim, Dong-Yu
2015-07-01
Electret and organic floating-gate memories are next-generation flash storage mediums for printed organic complementary circuits. While each flash memory can be easily fabricated using solution processes on flexible plastic substrates, promising their potential for on-chip memory organization is limited by unreliable bit operation and high write loads. We here report that new architecture could improve the overall performance of organic memory, and especially meet high storage for multi-level operation. Our concept depends on synergistic effect of electrical characterization in combination with a polymer electret (poly(2-vinyl naphthalene) (PVN)) and metal nanoparticles (Copper). It is distinguished from mostly organic nano-floating-gate memories by using the electret dielectric instead of general tunneling dielectric for additional charge storage. The uniform stacking of organic layers including various dielectrics and poly(3-hexylthiophene) (P3HT) as an organic semiconductor, followed by thin-film coating using orthogonal solvents, greatly improve device precision despite easy and fast manufacture. Poly(vinylidene fluoride-trifluoroethylene) [P(VDF-TrFE)] as high-k blocking dielectric also allows reduction of programming voltage. The reported synergistic organic memory devices represent low power consumption, high cycle endurance, high thermal stability and suitable retention time, compared to electret and organic nano-floating-gate memory devices.
Synergistic High Charge-Storage Capacity for Multi-level Flexible Organic Flash Memory.
Kang, Minji; Khim, Dongyoon; Park, Won-Tae; Kim, Jihong; Kim, Juhwan; Noh, Yong-Young; Baeg, Kang-Jun; Kim, Dong-Yu
2015-07-23
Electret and organic floating-gate memories are next-generation flash storage mediums for printed organic complementary circuits. While each flash memory can be easily fabricated using solution processes on flexible plastic substrates, promising their potential for on-chip memory organization is limited by unreliable bit operation and high write loads. We here report that new architecture could improve the overall performance of organic memory, and especially meet high storage for multi-level operation. Our concept depends on synergistic effect of electrical characterization in combination with a polymer electret (poly(2-vinyl naphthalene) (PVN)) and metal nanoparticles (Copper). It is distinguished from mostly organic nano-floating-gate memories by using the electret dielectric instead of general tunneling dielectric for additional charge storage. The uniform stacking of organic layers including various dielectrics and poly(3-hexylthiophene) (P3HT) as an organic semiconductor, followed by thin-film coating using orthogonal solvents, greatly improve device precision despite easy and fast manufacture. Poly(vinylidene fluoride-trifluoroethylene) [P(VDF-TrFE)] as high-k blocking dielectric also allows reduction of programming voltage. The reported synergistic organic memory devices represent low power consumption, high cycle endurance, high thermal stability and suitable retention time, compared to electret and organic nano-floating-gate memory devices.
Hydrophobic duck feathers and their simulation on textile substrates for water repellent treatment.
Liu, Yuyang; Chen, Xianqiong; Xin, J H
2008-12-01
Inspired by the non-wetting phenomena of duck feathers, the water repellent property of duck feathers was studied at the nanoscale. The microstructures of the duck feather were investigated by a scanning electron microscope (SEM) imaging method through a step-by-step magnifying procedure. The SEM results show that duck feathers have a multi-scale structure and that this multi-scale structure as well as the preening oil are responsible for their super hydrophobic behavior. The microstructures of the duck feather were simulated on textile substrates using the biopolymer chitosan as building blocks through a novel surface solution precipitation (SSP) method, and then the textile substrates were further modified with a silicone compound to achieve low surface energy. The resultant textiles exhibit super water repellent properties, thus providing a simple bionic way to create super hydrophobic surfaces on soft substrates using flexible material as building blocks.
Polar order in nanostructured organic materials
NASA Astrophysics Data System (ADS)
Sayar, M.; Olvera de la Cruz, M.; Stupp, S. I.
2003-02-01
Achiral multi-block liquid crystals are not expected to form polar domains. Recently, however, films of nanoaggregates formed by multi-block rodcoil molecules were identified as the first example of achiral single-component materials with macroscopic polar properties. By solving an Ising-like model with dipolar and asymmetric short-range interactions, we show here that polar domains are stable in films composed of aggregates as opposed to isolated molecules. Unlike classical molecular systems, these nanoaggregates have large intralayer spacings (a approx 8 nm), leading to a reduction in the repulsive dipolar interactions which oppose polar order within layers. In finite-thickness films of nanostructures, this effect enables the formation of polar domains. We compute exactly the energies of the possible structures consistent with the experiments as a function of film thickness at zero temperature (T). We also provide Monte Carlo simulations at non-zero T for a disordered hexagonal lattice that resembles the smectic-like packing in these nanofilms.
Liang, Fei; Hongxin, Li; Zhang, Hai-Zhou; Wenbin, Guo; Zou, Cheng-Wei; Farhaj, Zeeshan
2017-04-17
Device closure of a wide-spaced multi-hole PmVSD is difficult to succeed in percutaneous approach. This study is to evaluate the feasibility, safety and efficacy of perventricular device closure of wide-spaced multi-hole PmVSD using a double-device implanting technique. Sixteen patients with wide-spaced multi-hole PmVSD underwent perventricular closure with two devices through an inferior median sternotomy approach under transesophageal echocardiographic guidance. The largest hole and its adjacent small holes were occluded with an optimal-sized device. The far-away residual hole was occluded with the other device using a probe-assisted delivery system. All patients were followed up for a period of 1 to 4 years to determine the residual shunt, atrioventricular block and the adjacent valvular function. The number of the holes of the PmVSD was 2 to 4. The maximum distance between the holes was 5.0 to 10.0 mm (median, 6.4 mm). The diameter of the largest hole was 2.5 to 7.0 mm (median, 3.6 mm). The success rate of double-device closure was 100%. Immediate residual shunts were found in 6 patients (38%), and incomplete right bundle branch block at discharge occurred in 3 cases (19%). Both complications decreased to 6% at 1-year follow-up. Neither of them had a severe device-related complication. Perventricular closure of a wide-spaced multi-hole PmVSD using a double-device implanting technique is feasible, safe, and efficacious. In multi-hole PmVSDs with the distance between the holes of more than 5 mm, double-device implantation may achieve a complete occlusion.
Deterministic secure quantum communication using a single d-level system.
Jiang, Dong; Chen, Yuanyuan; Gu, Xuemei; Xie, Ling; Chen, Lijun
2017-03-22
Deterministic secure quantum communication (DSQC) can transmit secret messages between two parties without first generating a shared secret key. Compared with quantum key distribution (QKD), DSQC avoids the waste of qubits arising from basis reconciliation and thus reaches higher efficiency. In this paper, based on data block transmission and order rearrangement technologies, we propose a DSQC protocol. It utilizes a set of single d-level systems as message carriers, which are used to directly encode the secret message in one communication process. Theoretical analysis shows that these employed technologies guarantee the security, and the use of a higher dimensional quantum system makes our protocol achieve higher security and efficiency. Since only quantum memory is required for implementation, our protocol is feasible with current technologies. Furthermore, Trojan horse attack (THA) is taken into account in our protocol. We give a THA model and show that THA significantly increases the multi-photon rate and can thus be detected.
Collapse transitions in thermosensitive multi-block copolymers: A Monte Carlo study
NASA Astrophysics Data System (ADS)
Rissanou, Anastassia N.; Tzeli, Despoina S.; Anastasiadis, Spiros H.; Bitsanis, Ioannis A.
2014-05-01
Monte Carlo simulations are performed on a simple cubic lattice to investigate the behavior of a single linear multiblock copolymer chain of various lengths N. The chain of type (AnBn)m consists of alternating A and B blocks, where A are solvophilic and B are solvophobic and N = 2nm. The conformations are classified in five cases of globule formation by the solvophobic blocks of the chain. The dependence of globule characteristics on the molecular weight and on the number of blocks, which participate in their formation, is examined. The focus is on relative high molecular weight blocks (i.e., N in the range of 500-5000 units) and very differing energetic conditions for the two blocks (very good—almost athermal solvent for A and bad solvent for B). A rich phase behavior is observed as a result of the alternating architecture of the multiblock copolymer chain. We trust that thermodynamic equilibrium has been reached for chains of N up to 2000 units; however, for longer chains kinetic entrapments are observed. The comparison among equivalent globules consisting of different number of B-blocks shows that the more the solvophobic blocks constituting the globule the bigger its radius of gyration and the looser its structure. Comparisons between globules formed by the solvophobic blocks of the multiblock copolymer chain and their homopolymer analogs highlight the important role of the solvophilic A-blocks.
Incompressible SPH (ISPH) with fast Poisson solver on a GPU
NASA Astrophysics Data System (ADS)
Chow, Alex D.; Rogers, Benedict D.; Lind, Steven J.; Stansby, Peter K.
2018-05-01
This paper presents a fast incompressible SPH (ISPH) solver implemented to run entirely on a graphics processing unit (GPU) capable of simulating several millions of particles in three dimensions on a single GPU. The ISPH algorithm is implemented by converting the highly optimised open-source weakly-compressible SPH (WCSPH) code DualSPHysics to run ISPH on the GPU, combining it with the open-source linear algebra library ViennaCL for fast solutions of the pressure Poisson equation (PPE). Several challenges are addressed with this research: constructing a PPE matrix every timestep on the GPU for moving particles, optimising the limited GPU memory, and exploiting fast matrix solvers. The ISPH pressure projection algorithm is implemented as 4 separate stages, each with a particle sweep, including an algorithm for the population of the PPE matrix suitable for the GPU, and mixed precision storage methods. An accurate and robust ISPH boundary condition ideal for parallel processing is also established by adapting an existing WCSPH boundary condition for ISPH. A variety of validation cases are presented: an impulsively started plate, incompressible flow around a moving square in a box, and dambreaks (2-D and 3-D) which demonstrate the accuracy, flexibility, and speed of the methodology. Fragmentation of the free surface is shown to influence the performance of matrix preconditioners and therefore the PPE matrix solution time. The Jacobi preconditioner demonstrates robustness and reliability in the presence of fragmented flows. For a dambreak simulation, GPU speed ups demonstrate up to 10-18 times and 1.1-4.5 times compared to single-threaded and 16-threaded CPU run times respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bui, Quan M.; Wang, Lu; Osei-Kuffuor, Daniel
Multiphase flow is a critical process in a wide range of applications, including oil and gas recovery, carbon sequestration, and contaminant remediation. Numerical simulation of multiphase flow requires solving of a large, sparse linear system resulting from the discretization of the partial differential equations modeling the flow. In the case of multiphase multicomponent flow with miscible effect, this is a very challenging task. The problem becomes even more difficult if phase transitions are taken into account. A new approach to handle phase transitions is to formulate the system as a nonlinear complementarity problem (NCP). Unlike in the primary variable switchingmore » technique, the set of primary variables in this approach is fixed even when there is phase transition. Not only does this improve the robustness of the nonlinear solver, it opens up the possibility to use multigrid methods to solve the resulting linear system. The disadvantage of the complementarity approach, however, is that when a phase disappears, the linear system has the structure of a saddle point problem and becomes indefinite, and current algebraic multigrid (AMG) algorithms cannot be applied directly. In this study, we explore the effectiveness of a new multilevel strategy, based on the multigrid reduction technique, to deal with problems of this type. We demonstrate the effectiveness of the method through numerical results for the case of two-phase, two-component flow with phase appearance/disappearance. In conclusion, we also show that the strategy is efficient and scales optimally with problem size.« less
Use of general purpose graphics processing units with MODFLOW
Hughes, Joseph D.; White, Jeremy T.
2013-01-01
To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.