Parallel and Portable Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.
1997-08-01
We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD
NASA Technical Reports Server (NTRS)
Gropp, W. D.; Keyes, D. E.; McInnes, L. C.; Tidriri, M. D.
1998-01-01
Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, "routine" parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (Psi-NKS) algorithmic framework is presented as an answer. We show that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, Psi-NKS can simultaneously deliver: globalized, asymptotically rapid convergence through adaptive pseudo- transient continuation and Newton's method-, reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and high per- processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of Psi-NKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of Psi-NKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here.
Parallel computing techniques for rotorcraft aerodynamics
NASA Astrophysics Data System (ADS)
Ekici, Kivanc
The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
A Data Parallel Multizone Navier-Stokes Code
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)
1995-01-01
We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Halem, Milton (Technical Monitor)
2000-01-01
We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Implicit schemes and parallel computing in unstructured grid CFD
NASA Technical Reports Server (NTRS)
Venkatakrishnam, V.
1995-01-01
The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
ERIC Educational Resources Information Center
Agarwal, Mayank
2009-01-01
The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Parallelizing alternating direction implicit solver on GPUs
USDA-ARS?s Scientific Manuscript database
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.
1994-01-01
The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Parallelization of implicit finite difference schemes in computational fluid dynamics
NASA Technical Reports Server (NTRS)
Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel
1990-01-01
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code.
NASA Astrophysics Data System (ADS)
Chacon, L.; Knoll, D. A.
2004-11-01
We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended primitive-variable MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field, non-dissipative, and stable in the absence of physical dissipation.(L. Chacón , phComput. Phys. Comm.) submitted (2004) PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, first and second-order implicit schemes are available, although higher-order temporal implicit schemes can be effortlessly implemented within the Newton-Krylov framework. A successful, scalable, MG physics-based preconditioning strategy, similar in concept to previous 2D MHD efforts,(L. Chacón et al., phJ. Comput. Phys). 178 (1), 15- 36 (2002); phJ. Comput. Phys., 188 (2), 573-592 (2003) has been developed. We are currently in the process of parallelizing the code using the PETSc library, and a Newton-Krylov-Schwarz approach for the parallel treatment of the preconditioner. In this poster, we will report on both the serial and parallel performance of PIXIE3D, focusing primarily on scalability and CPU speedup vs. an explicit approach.
Three dimensional modelling of earthquake rupture cycles on frictional faults
NASA Astrophysics Data System (ADS)
Simpson, Guy; May, Dave
2017-04-01
We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, Toshio
1999-01-01
This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code
NASA Astrophysics Data System (ADS)
Chacon, Luis
2006-10-01
We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field to machine precision, non-dissipative, and linearly and nonlinearly stable in the absence of physical dissipation. PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, second-order implicit schemes such as Crank-Nicolson and BDF2 (2^nd order backward differentiation formula) are available. PIXIE3D is fully parallel (employs PETSc for parallelism), and exhibits excellent parallel scalability. A parallel, scalable, MG preconditioning strategy, based on physics-based preconditioning ideas, has been developed for resistive MHD, and is currently being extended to Hall MHD. In this poster, we will report on progress in the algorithmic formulation for extended MHD, as well as the the serial and parallel performance of PIXIE3D in a variety of problems and geometries. L. Chac'on, Comput. Phys. Comm., 163 (3), 143-171 (2004) L. Chac'on et al., J. Comput. Phys. 178 (1), 15- 36 (2002); J. Comput. Phys., 188 (2), 573-592 (2003) L. Chac'on, 32nd EPS Conf. Plasma Physics, Tarragona, Spain, 2005 L. Chac'on et al., 33rd EPS Conf. Plasma Physics, Rome, Italy, 2006
ProperCAD: A portable object-oriented parallel environment for VLSI CAD
NASA Technical Reports Server (NTRS)
Ramkumar, Balkrishna; Banerjee, Prithviraj
1993-01-01
Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.
A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport
NASA Astrophysics Data System (ADS)
Robinson, P. B.; Peterson, J. D. L.
2005-12-01
The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48
Portable parallel portfolio optimization in the Aurora Financial Management System
NASA Astrophysics Data System (ADS)
Laure, Erwin; Moritsch, Hans
2001-07-01
Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.
NASA Technical Reports Server (NTRS)
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
ASC-ATDM Performance Portability Requirements for 2015-2019
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Harold C.; Trott, Christian Robert
This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1999-01-01
Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.
A real-time MPEG software decoder using a portable message-passing library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan
1995-12-31
We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Chiou, Jin-Chern
1990-01-01
Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
NASA Technical Reports Server (NTRS)
Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)
1992-01-01
A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
A design methodology for portable software on parallel computers
NASA Technical Reports Server (NTRS)
Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.
1993-01-01
This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Proceedings of the second SISAL users` conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J T; Frerking, C; Miller, P J
1992-12-01
This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
NASA Astrophysics Data System (ADS)
Chacon, L.; Finn, J. M.; Knoll, D. A.
2000-10-01
Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Parallel, Distributed Scripting with Python
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, P J
2002-05-24
Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
A Second-Order Implicit Knowledge: Its Implications for E-Learning
ERIC Educational Resources Information Center
Noaparast, Khosrow Bagheri
2014-01-01
The dichotomous epistemology of explicit/implicit knowledge has led to two parallel lines of research; one putting the emphasis on explicit knowledge which has been the main road of e-learning, and the other taking implicit knowledge as the core of learning which has shaped a critical line to the current e-learning. It is argued in this article…
Explicit pre-training instruction does not improve implicit perceptual-motor sequence learning
Sanchez, Daniel J.; Reber, Paul J.
2012-01-01
Memory systems theory argues for separate neural systems supporting implicit and explicit memory in the human brain. Neuropsychological studies support this dissociation, but empirical studies of cognitively healthy participants generally observe that both kinds of memory are acquired to at least some extent, even in implicit learning tasks. A key question is whether this observation reflects parallel intact memory systems or an integrated representation of memory in healthy participants. Learning of complex tasks in which both explicit instruction and practice is used depends on both kinds of memory, and how these systems interact will be an important component of the learning process. Theories that posit an integrated, or single, memory system for both types of memory predict that explicit instruction should contribute directly to strengthening task knowledge. In contrast, if the two types of memory are independent and acquired in parallel, explicit knowledge should have no direct impact and may serve in a “scaffolding” role in complex learning. Using an implicit perceptual-motor sequence learning task, the effect of explicit pre-training instruction on skill learning and performance was assessed. Explicit pre-training instruction led to robust explicit knowledge, but sequence learning did not benefit from the contribution of pre-training sequence memorization. The lack of an instruction benefit suggests that during skill learning, implicit and explicit memory operate independently. While healthy participants will generally accrue parallel implicit and explicit knowledge in complex tasks, these types of information appear to be separately represented in the human brain consistent with multiple memory systems theory. PMID:23280147
Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie
2014-01-01
It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
2014-07-22
The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions
NASA Astrophysics Data System (ADS)
Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey
2014-01-01
The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-30
... collection of information is necessary for the proper performance of the functions of the Commission... rate-of-return carriers (Interstate Common Line Support (ICLS)) to replace implicit support in interstate access charges with explicit support that is portable to all eligible telecommunications carriers...
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH
NASA Astrophysics Data System (ADS)
Lee, D.; Gopal, S.; Mohapatra, P.
2012-07-01
We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Convergence issues in domain decomposition parallel computation of hovering rotor
NASA Astrophysics Data System (ADS)
Xiao, Zhongyun; Liu, Gang; Mou, Bin; Jiang, Xiong
2018-05-01
Implicit LU-SGS time integration algorithm has been widely used in parallel computation in spite of its lack of information from adjacent domains. When applied to parallel computation of hovering rotor flows in a rotating frame, it brings about convergence issues. To remedy the problem, three LU factorization-based implicit schemes (consisting of LU-SGS, DP-LUR and HLU-SGS) are investigated comparatively. A test case of pure grid rotation is designed to verify these algorithms, which show that LU-SGS algorithm introduces errors on boundary cells. When partition boundaries are circumferential, errors arise in proportion to grid speed, accumulating along with the rotation, and leading to computational failure in the end. Meanwhile, DP-LUR and HLU-SGS methods show good convergence owing to boundary treatment which are desirable in domain decomposition parallel computations.
Application of CHAD hydrodynamics to shock-wave problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trease, H.E.; O`Rourke, P.J.; Sahota, M.S.
1997-12-31
CHAD is the latest in a sequence of continually evolving computer codes written to effectively utilize massively parallel computer architectures and the latest grid generators for unstructured meshes. Its applications range from automotive design issues such as in-cylinder and manifold flows of internal combustion engines, vehicle aerodynamics, underhood cooling and passenger compartment heating, ventilation, and air conditioning to shock hydrodynamics and materials modeling. CHAD solves the full unsteady Navier-Stoke equations with the k-epsilon turbulence model in three space dimensions. The code has four major features that distinguish it from the earlier KIVA code, also developed at Los Alamos. First, itmore » is based on a node-centered, finite-volume method in which, like finite element methods, all fluid variables are located at computational nodes. The computational mesh efficiently and accurately handles all element shapes ranging from tetrahedra to hexahedra. Second, it is written in standard Fortran 90 and relies on automatic domain decomposition and a universal communication library written in standard C and MPI for unstructured grids to effectively exploit distributed-memory parallel architectures. Thus the code is fully portable to a variety of computing platforms such as uniprocessor workstations, symmetric multiprocessors, clusters of workstations, and massively parallel platforms. Third, CHAD utilizes a variable explicit/implicit upwind method for convection that improves computational efficiency in flows that have large velocity Courant number variations due to velocity of mesh size variations. Fourth, CHAD is designed to also simulate shock hydrodynamics involving multimaterial anisotropic behavior under high shear. The authors will discuss CHAD capabilities and show several sample calculations showing the strengths and weaknesses of CHAD.« less
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Saini, Subhash (Technical Monitor)
1998-01-01
Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Efficient parallel implicit methods for rotary-wing aerodynamics calculations
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.
Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid
1999-01-01
The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-12-27
... performance of the functions of the Commission, including whether the information shall have practical utility...-of-return carriers (Interstate Common Line Support (ICLS)) to replace implicit support in interstate access charges with explicit support that is portable to all eligible telecommunications carriers. The...
Parallel Implicit Runge-Kutta Methods Applied to Coupled Orbit/Attitude Propagation
NASA Astrophysics Data System (ADS)
Hatten, Noble; Russell, Ryan P.
2017-12-01
A variable-step Gauss-Legendre implicit Runge-Kutta (GLIRK) propagator is applied to coupled orbit/attitude propagation. Concepts previously shown to improve efficiency in 3DOF propagation are modified and extended to the 6DOF problem, including the use of variable-fidelity dynamics models. The impact of computing the stage dynamics of a single step in parallel is examined using up to 23 threads and 22 associated GLIRK stages; one thread is reserved for an extra dynamics function evaluation used in the estimation of the local truncation error. Efficiency is found to peak for typical examples when using approximately 8 to 12 stages for both serial and parallel implementations. Accuracy and efficiency compare favorably to explicit Runge-Kutta and linear-multistep solvers for representative scenarios. However, linear-multistep methods are found to be more efficient for some applications, particularly in a serial computing environment, or when parallelism can be applied across multiple trajectories.
NASA Astrophysics Data System (ADS)
Borazjani, Iman; Asgharzadeh, Hafez
2015-11-01
Flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates with explicit and semi-implicit schemes. Implicit schemes can be used to overcome these restrictions. However, implementing implicit solver for nonlinear equations including Navier-Stokes is not straightforward. Newton-Krylov subspace methods (NKMs) are one of the most advanced iterative methods to solve non-linear equations such as implicit descritization of the Navier-Stokes equation. The efficiency of NKMs massively depends on the Jacobian formation method, e.g., automatic differentiation is very expensive, and matrix-free methods slow down as the mesh is refined. Analytical Jacobian is inexpensive method, but derivation of analytical Jacobian for Navier-Stokes equation on staggered grid is challenging. The NKM with a novel analytical Jacobian was developed and validated against Taylor-Green vortex and pulsatile flow in a 90 degree bend. The developed method successfully handled the complex geometries such as an intracranial aneurysm with multiple overset grids, and immersed boundaries. It is shown that the NKM with an analytical Jacobian is 3 to 25 times faster than the fixed-point implicit Runge-Kutta method, and more than 100 times faster than automatic differentiation depending on the grid (size) and the flow problem. The developed methods are fully parallelized with parallel efficiency of 80-90% on the problems tested.
Adaptive implicit-explicit and parallel element-by-element iteration schemes
NASA Technical Reports Server (NTRS)
Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.
1989-01-01
Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
Aeras: A next generation global atmosphere model
Spotz, William F.; Smith, Thomas M.; Demeshko, Irina P.; ...
2015-06-01
Sandia National Laboratories is developing a new global atmosphere model named Aeras that is performance portable and supports the quantification of uncertainties. These next-generation capabilities are enabled by building Aeras on top of Albany, a code base that supports the rapid development of scientific application codes while leveraging Sandia's foundational mathematics and computer science packages in Trilinos and Dakota. Embedded uncertainty quantification (UQ) is an original design capability of Albany, and performance portability is a recent upgrade. Other required features, such as shell-type elements, spectral elements, efficient explicit and semi-implicit time-stepping, transient sensitivity analysis, and concurrent ensembles, were not componentsmore » of Albany as the project began, and have been (or are being) added by the Aeras team. We present early UQ and performance portability results for the shallow water equations.« less
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Radiation-MHD Simulations of Pillars and Globules in HII Regions
NASA Astrophysics Data System (ADS)
Mackey, J.
2012-07-01
Implicit and explicit raytracing-photoionisation algorithms have been implemented in the author's radiation-magnetohydrodynamics code. The algorithms are described briefly and their efficiency and parallel scaling are investigated. The implicit algorithm is more efficient for calculations where ionisation fronts have very supersonic velocities, and the explicit algorithm is favoured in the opposite limit because of its better parallel scaling. The implicit method is used to investigate the effects of initially uniform magnetic fields on the formation and evolution of dense pillars and cometary globules at the boundaries of HII regions. It is shown that for weak and medium field strengths an initially perpendicular field is swept into alignment with the pillar during its dynamical evolution, matching magnetic field observations of the ‘Pillars of Creation’ in M16. A strong perpendicular magnetic field remains in its initial configuration and also confines the photoevaporation flow into a bar-shaped, dense, ionised ribbon which partially shields the ionisation front.
Maximal clique enumeration with data-parallel primitives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lessley, Brenton; Perciano, Talita; Mathai, Manish
The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Parallelized implicit propagators for the finite-difference Schrödinger equation
NASA Astrophysics Data System (ADS)
Parker, Jonathan; Taylor, K. T.
1995-08-01
We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
White and Black American Children’s Implicit Intergroup Bias
Newheiser, Anna-Kaisa; Olson, Kristina R.
2011-01-01
Despite a decline in explicit prejudice, adults and children from majority groups (e.g., White Americans) often express bias implicitly, as assessed by the Implicit Association Test. In contrast, minority-group (e.g., Black American) adults on average show no bias on the IAT. In the present research, representing the first empirical investigation of whether Black children’s IAT responses parallel those of Black adults, we examined implicit bias in 7–11-year-old White and Black American children. Replicating previous findings with adults, whereas White children showed a robust ingroup bias, Black children showed no bias. Additionally, we investigated the role of valuing status in the development of implicit bias. For Black children, explicit preference for high status predicted implicit outgroup bias: Black children who explicitly expressed high preference for rich (vs. poor) people showed an implicit preference for Whites comparable in magnitude to White children’s ingroup bias. Implications for research on intergroup bias are discussed. PMID:22184478
Developing Information Power Grid Based Algorithms and Software
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G.A. Pope; K. Sephernoori; D.C. McKinney
1996-03-15
This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
NASA Astrophysics Data System (ADS)
Ku, Seung-Hoe; Hager, R.; Chang, C. S.; Chacon, L.; Chen, G.; EPSI Team
2016-10-01
The cancelation problem has been a long-standing issue for long wavelengths modes in electromagnetic gyrokinetic PIC simulations in toroidal geometry. As an attempt of resolving this issue, we implemented a fully implicit time integration scheme in the full-f, gyrokinetic PIC code XGC1. The new scheme - based on the implicit Vlasov-Darwin PIC algorithm by G. Chen and L. Chacon - can potentially resolve cancelation problem. The time advance for the field and the particle equations is space-time-centered, with particle sub-cycling. The resulting system of equations is solved by a Picard iteration solver with fixed-point accelerator. The algorithm is implemented in the parallel velocity formalism instead of the canonical parallel momentum formalism. XGC1 specializes in simulating the tokamak edge plasma with magnetic separatrix geometry. A fully implicit scheme could be a way to accurate and efficient gyrokinetic simulations. We will test if this numerical scheme overcomes the cancelation problem, and reproduces the dispersion relation of Alfven waves and tearing modes in cylindrical geometry. Funded by US DOE FES and ASCR, and computing resources provided by OLCF through ALCC.
Performance Analysis and Portability of the PLUM Load Balancing System
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1998-01-01
The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
Xia, Yidong; Luo, Hong; Frisbey, Megan; ...
2014-07-01
A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flowmore » solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tom Anderson; David Culler; James Demmel
2000-02-16
The goal of the Castle project was to provide a parallel programming environment that enables the construction of high performance applications that run portably across many platforms. The authors approach was to design and implement a multilayered architecture, with higher levels building on lower ones to ensure portability, but with care taken not to introduce abstractions that sacrifice performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190
2015-03-15
We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Motivation modulates the effect of approach on implicit preferences.
Zogmaister, Cristina; Perugini, Marco; Richetin, Juliette
2016-08-01
With three studies, we investigated whether motivational states can modulate the formation of implicit preferences. In Study 1, participants played a video game in which they repeatedly approached one of two similar beverages, while disregarding the other. A subsequent implicit preference for the target beverage emerged, which increased with participants' thirst. In Study 2, participants approached one brand of potato chips while avoiding the other: Conceptually replicating the moderation observed in Study 1, the implicit preference for the approached brand increased with the number of hours from last food intake. In Study 3, we experimentally manipulated hunger, and the moderation effect emerged again, with hungry participants displaying a higher implicit preference for the approached brand, as compared to satiated participants. In the three studies, the moderation effect was not paralleled in explicit preferences although the latter were affected by the preference inducing manipulation. Theoretical implications and open questions are discussed.
Explicit and Implicit Approach Motivation Interact to Predict Interpersonal Arrogance
Robinson, Michael D.; Ode, Scott; Spencer L., Palder; Fetterman, Adam K.
2012-01-01
Self-reports of approach motivation are unlikely to be sufficient in understanding the extent to which the individual reacts to appetitive cues in an approach-related manner. A novel implicit probe of approach tendencies was thus developed, one that assessed the extent to which positive affective (versus neutral) stimuli primed larger size estimates, as larger perceptual sizes co-occur with locomotion toward objects in the environment. In two studies (total N = 150), self-reports of approach motivation interacted with this implicit probe of approach motivation to predict individual differences in arrogance, a broad interpersonal dimension previously linked to narcissism, antisocial personality tendencies, and aggression. The results of the two studies were highly parallel in that self-reported levels of approach motivation predicted interpersonal arrogance in the particular context of high, but not low, levels of implicit approach motivation. Implications for understanding approach motivation, implicit probes of it, and problematic approach-related outcomes are discussed. PMID:22399360
Explicit and implicit approach motivation interact to predict interpersonal arrogance.
Robinson, Michael D; Ode, Scott; Palder, Spencer L; Fetterman, Adam K
2012-07-01
Self-reports of approach motivation are unlikely to be sufficient in understanding the extent to which the individual reacts to appetitive cues in an approach-related manner. A novel implicit probe of approach tendencies was thus developed, one that assessed the extent to which positive affective (versus neutral) stimuli primed larger size estimates, as larger perceptual sizes co-occur with locomotion toward objects in the environment. In two studies (total N = 150), self-reports of approach motivation interacted with this implicit probe of approach motivation to predict individual differences in arrogance, a broad interpersonal dimension previously linked to narcissism, antisocial personality tendencies, and aggression. The results of the two studies were highly parallel in that self-reported levels of approach motivation predicted interpersonal arrogance in the particular context of high, but not low, levels of implicit approach motivation. Implications for understanding approach motivation, implicit probes of it, and problematic approach-related outcomes are discussed.
A portable MPI-based parallel vector template library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Roche, Bryan; O’Reilly, Anthony; Gavin, Amanda; Ruiz, Maria R.; Arancibia, Gabriela
2012-01-01
Background The development of implicit tests for measuring biases and behavioral predispositions is a recent development within psychology. While such tests are usually researched within a social-cognitive paradigm, behavioral researchers have also begun to view these tests as potential tests of conditioning histories, including in the sexual domain. Objective The objective of this paper is to illustrate the utility of a behavioral approach to implicit testing and means by which implicit tests can be built to the standards of behavioral psychologists. Design Research findings illustrating the short history of implicit testing within the experimental analysis of behavior are reviewed. Relevant parallel and overlapping research findings from the field of social cognition and on the Implicit Association Test are also outlined. Results New preliminary data obtained with both normal and sex offender populations are described in order to illustrate how behavior-analytically conceived implicit tests may have potential as investigative tools for assessing histories of sexual arousal conditioning and derived stimulus associations. Conclusion It is concluded that popular implicit tests are likely sensitive to conditioned and derived stimulus associations in the history of the test-taker rather than ‘unconscious cognitions’, per se. PMID:24693346
Development and Verification of the Charring Ablating Thermal Protection Implicit System Solver
NASA Technical Reports Server (NTRS)
Amar, Adam J.; Calvert, Nathan D.; Kirk, Benjamin S.
2010-01-01
The development and verification of the Charring Ablating Thermal Protection Implicit System Solver is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method with first and second order implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton's method, while the fully implicit linear system is solved with the Generalized Minimal Residual method. Verification results from exact solutions and the Method of Manufactured Solutions are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.
A Theory Upon Origin of Implicit Musical Language.
Vas József, P
2015-11-30
The author suggests that the origin of musicality is implied in an implicit musical language every human being possesses in uterus due to a resonance and attunement with prenatal environment, mainly the mother. It is emphasized that ego-development and evolving implicit musical language can be regarded as parallel processes. To support this idea a lot of examples of musical representations are demonstrated by the author. Music is viewed as a tone of ego-functioning involving the musical representations of bodily and visceral senses, cross-modal perception, unity of sense of self, individual fate of ego, and tripolar and bipolar musical coping codes. Finally, a special form of music therapy is shown to illustrate how can implicit musical language be transformed into explicit language by virtue of participants' spontaneity, creativity, and playfulness.
A Theory Upon Origin of Implicit Musical Language
Vas József, P.
2015-01-01
The author suggests that the origin of musicality is implied in an implicit musical language every human being possesses in uterus due to a resonance and attunement with prenatal environment, mainly the mother. It is emphasized that ego-development and evolving implicit musical language can be regarded as parallel processes. To support this idea a lot of examples of musical representations are demonstrated by the author. Music is viewed as a tone of ego-functioning involving the musical representations of bodily and visceral senses, cross-modal perception, unity of sense of self, individual fate of ego, and tripolar and bipolar musical coping codes. Finally, a special form of music therapy is shown to illustrate how can implicit musical language be transformed into explicit language by virtue of participants’ spontaneity, creativity, and playfulness. PMID:26973966
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harms, Kevin; Oral, H. Sarp; Atchley, Scott
The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.
ERIC Educational Resources Information Center
Lundquist, Carol; Frieder, Ophir; Holmes, David O.; Grossman, David
1999-01-01
Describes a scalable, parallel, relational database-drive information retrieval engine. To support portability across a wide range of execution environments, all algorithms adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy is enhanced over prior database-driven information retrieval efforts. Presents…
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques
NASA Astrophysics Data System (ADS)
Gullerud, Arne Stewart
This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
Propranolol reduces implicit negative racial bias.
Terbeck, Sylvia; Kahane, Guy; McTavish, Sarah; Savulescu, Julian; Cowen, Philip J; Hewstone, Miles
2012-08-01
Implicit negative attitudes towards other races are important in certain kinds of prejudicial social behaviour. Emotional mechanisms are thought to be involved in mediating implicit "outgroup" bias but there is little evidence concerning the underlying neurobiology. The aim of the present study was to examine the role of noradrenergic mechanisms in the generation of implicit racial attitudes. Healthy volunteers (n = 36) of white ethnic origin, received a single oral dose of the β-adrenoceptor antagonist, propranolol (40 mg), in a randomised, double-blind, parallel group, placebo-controlled, design. Participants completed an explicit measure of prejudice and the racial implicit association test (IAT), 1-2 h after propranolol administration. Relative to placebo, propranolol significantly lowered heart rate and abolished implicit racial bias, without affecting the measure of explicit racial prejudice. Propranolol did not affect subjective mood. Our results indicate that β-adrenoceptors play a role in the expression of implicit racial attitudes suggesting that noradrenaline-related emotional mechanisms may mediate negative racial bias. Our findings may also have practical importance given that propranolol is a widely used drug. However, further studies will be needed to examine whether a similar effect can be demonstrated in the course of clinical treatment.
A survey of packages for large linear systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Milne, Brent
2000-02-11
This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
A practical approach to portability and performance problems on massively parallel supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beazley, D.M.; Lomdahl, P.S.
1994-12-08
We present an overview of the tactics we have used to achieve a high-level of performance while improving portability for a large-scale molecular dynamics code SPaSM. SPaSM was originally implemented in ANSI C with message passing for the Connection Machine 5 (CM-5). In 1993, SPaSM was selected as one of the winners in the IEEE Gordon Bell Prize competition for sustaining 50 Gflops on the 1024 node CM-5 at Los Alamos National Laboratory. Achieving this performance on the CM-5 required rewriting critical sections of code in CDPEAC assembler language. In addition, the code made extensive use of CM-5 parallel I/Omore » and the CMMD message passing library. Given this highly specialized implementation, we describe how we have ported the code to the Cray T3D and high performance workstations. In addition we will describe how it has been possible to do this using a single version of source code that runs on all three platforms without sacrificing any performance. Sound too good to be true? We hope to demonstrate that one can realize both code performance and portability without relying on the latest and greatest prepackaged tool or parallelizing compiler.« less
Peters, Kurt R; Gawronski, Bertram
2011-04-01
Research has demonstrated that implicit and explicit evaluations of the same object can diverge. Explanations of such dissociations frequently appeal to dual-process theories, such that implicit evaluations are assumed to reflect object-valence contingencies independent of their perceived validity, whereas explicit evaluations reflect the perceived validity of object-valence contingencies. Although there is evidence supporting these assumptions, it remains unclear if dissociations can arise in situations in which object-valence contingencies are judged to be true or false during the learning of these contingencies. Challenging dual-process accounts that propose a simultaneous operation of two parallel learning mechanisms, results from three experiments showed that the perceived validity of evaluative information about social targets qualified both explicit and implicit evaluations when validity information was available immediately after the encoding of the valence information; however, delaying the presentation of validity information reduced its qualifying impact for implicit, but not explicit, evaluations.
Horton, Keith D; Wilson, Daryl E; Vonk, Jennifer; Kirby, Sarah L; Nielsen, Tina
2005-07-01
Using the stem completion task, we compared estimates of automatic retrieval from an implicit memory task, the process dissociation procedure, and the speeded response procedure. Two standard manipulations were employed. In Experiment 1, a depth of processing effect was found on automatic retrieval using the speeded response procedure although this effect was substantially reduced in Experiment 2 when lexical processing was required of all words. In Experiment 3, the speeded response procedure showed an advantage of full versus divided attention at study on automatic retrieval. An implicit condition showed parallel effects in each study, suggesting that implicit stem completion may normally provide a good estimate of automatic retrieval. Also, we replicated earlier findings from the process dissociation procedure, but estimates of automatic retrieval from this procedure were consistently lower than those from the speeded response procedure, except when conscious retrieval was relatively low. We discuss several factors that may contribute to the conflicting outcomes, including the evidence for theoretical assumptions and criterial task differences between implicit and explicit tests.
The transfer of category knowledge by macaques (Macaca mulatta) and humans (Homo sapiens).
Zakrzewski, Alexandria C; Church, Barbara A; Smith, J David
2018-02-01
Cognitive psychologists distinguish implicit, procedural category learning (stimulus-response associations learned outside declarative cognition) from explicit-declarative category learning (conscious category rules). These systems are dissociated by category learning tasks with either a multidimensional, information-integration (II) solution or a unidimensional, rule-based (RB) solution. In the present experiments, humans and two monkeys learned II and RB category tasks fostering implicit and explicit learning, respectively. Then they received occasional transfer trials-never directly reinforced-drawn from untrained regions of the stimulus space. We hypothesized that implicit-procedural category learning-allied to associative learning-would transfer weakly because it is yoked to the training stimuli. This result was confirmed for humans and monkeys. We hypothesized that explicit category learning-allied to abstract category rules-would transfer robustly. This result was confirmed only for humans. That is, humans displayed explicit category knowledge that transferred flawlessly. Monkeys did not. This result illuminates the distinctive abstractness, stimulus independence, and representational portability of humans' explicit category rules. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Newton-like methods for Navier-Stokes solution
NASA Astrophysics Data System (ADS)
Qin, N.; Xu, X.; Richards, B. E.
1992-12-01
The paper reports on Newton-like methods called SFDN-alpha-GMRES and SQN-alpha-GMRES methods that have been devised and proven as powerful schemes for large nonlinear problems typical of viscous compressible Navier-Stokes solutions. They can be applied using a partially converged solution from a conventional explicit or approximate implicit method. Developments have included the efficient parallelization of the schemes on a distributed memory parallel computer. The methods are illustrated using a RISC workstation and a transputer parallel system respectively to solve a hypersonic vortical flow.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
Religion insulates ingroup evaluations: the development of intergroup attitudes in India.
Dunham, Yarrow; Srinivasan, Mahesh; Dotsch, Ron; Barner, David
2014-03-01
Research on the development of implicit intergroup attitudes has placed heavy emphasis on race, leaving open how social categories that are prominent in other cultures might operate. We investigate two of India's primary means of social distinction, caste and religion, and explore the development of implicit and explicit attitudes towards these groups in minority-status Muslim children and majority-status Hindu children, the latter drawn from various positions in the Hindu caste system. Results from two tests of implicit attitudes find that caste attitudes parallel previous findings for race: higher-caste children as well as lower-caste children have robust high-caste preferences. However, results for religion were strikingly different: both lower-status Muslim children and higher-status Hindu children show strong implicit ingroup preferences. We suggest that religion may play a protective role in insulating children from the internalization of stigma. © 2013 John Wiley & Sons Ltd.
Colangelo, Annette; Buchanan, Lori
2006-12-01
The failure of inhibition hypothesis posits a theoretical distinction between implicit and explicit access in deep dyslexia. Specifically, the effects of failure of inhibition are assumed only in conditions that have an explicit selection requirement in the context of production (i.e., aloud reading). In contrast, the failure of inhibition hypothesis proposes that implicit processing and explicit access to semantic information without production demands are intact in deep dyslexia. Evidence for intact implicit and explicit access requires that performance in deep dyslexia parallels that observed in neurologically intact participants on tasks based on implicit and explicit processes. In other words, deep dyslexics should produce normal effects in conditions with implicit task demands (i.e., lexical decision) and on tasks based on explicit access without production (i.e., forced choice semantic decisions) because failure of inhibition does not impact the availability of lexical information, only explicit retrieval in the context of production. This research examined the distinction between implicit and explicit processes in deep dyslexia using semantic blocking in lexical decision and forced choice semantic decisions as a test for the failure of inhibition hypothesis. The results of the semantic blocking paradigm support the distinction between implicit and explicit processing and provide evidence for failure of inhibition as an explanation for semantic errors in deep dyslexia.
A C++ Thread Package for Concurrent and Parallel Programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jie Chen; William Watson
1999-11-01
Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.
Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning.
McDougle, Samuel D; Bond, Krista M; Taylor, Jordan A
2015-07-01
A popular model of human sensorimotor learning suggests that a fast process and a slow process work in parallel to produce the canonical learning curve (Smith et al., 2006). Recent evidence supports the subdivision of sensorimotor learning into explicit and implicit processes that simultaneously subserve task performance (Taylor et al., 2014). We set out to test whether these two accounts of learning processes are homologous. Using a recently developed method to assay explicit and implicit learning directly in a sensorimotor task, along with a computational modeling analysis, we show that the fast process closely resembles explicit learning and the slow process approximates implicit learning. In addition, we provide evidence for a subdivision of the slow/implicit process into distinct manifestations of motor memory. We conclude that the two-state model of motor learning is a close approximation of sensorimotor learning, but it is unable to describe adequately the various implicit learning operations that forge the learning curve. Our results suggest that a wider net be cast in the search for the putative psychological mechanisms and neural substrates underlying the multiplicity of processes involved in motor learning. Copyright © 2015 the authors 0270-6474/15/359568-12$15.00/0.
Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning
Bond, Krista M.; Taylor, Jordan A.
2015-01-01
A popular model of human sensorimotor learning suggests that a fast process and a slow process work in parallel to produce the canonical learning curve (Smith et al., 2006). Recent evidence supports the subdivision of sensorimotor learning into explicit and implicit processes that simultaneously subserve task performance (Taylor et al., 2014). We set out to test whether these two accounts of learning processes are homologous. Using a recently developed method to assay explicit and implicit learning directly in a sensorimotor task, along with a computational modeling analysis, we show that the fast process closely resembles explicit learning and the slow process approximates implicit learning. In addition, we provide evidence for a subdivision of the slow/implicit process into distinct manifestations of motor memory. We conclude that the two-state model of motor learning is a close approximation of sensorimotor learning, but it is unable to describe adequately the various implicit learning operations that forge the learning curve. Our results suggest that a wider net be cast in the search for the putative psychological mechanisms and neural substrates underlying the multiplicity of processes involved in motor learning. PMID:26134640
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver
NASA Astrophysics Data System (ADS)
Chacon, Luis
2008-11-01
Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
A GPU-accelerated implicit meshless method for compressible flows
NASA Astrophysics Data System (ADS)
Zhang, Jia-Le; Ma, Zhi-Hua; Chen, Hong-Quan; Cao, Cheng
2018-05-01
This paper develops a recently proposed GPU based two-dimensional explicit meshless method (Ma et al., 2014) by devising and implementing an efficient parallel LU-SGS implicit algorithm to further improve the computational efficiency. The capability of the original 2D meshless code is extended to deal with 3D complex compressible flow problems. To resolve the inherent data dependency of the standard LU-SGS method, which causes thread-racing conditions destabilizing numerical computation, a generic rainbow coloring method is presented and applied to organize the computational points into different groups by painting neighboring points with different colors. The original LU-SGS method is modified and parallelized accordingly to perform calculations in a color-by-color manner. The CUDA Fortran programming model is employed to develop the key kernel functions to apply boundary conditions, calculate time steps, evaluate residuals as well as advance and update the solution in the temporal space. A series of two- and three-dimensional test cases including compressible flows over single- and multi-element airfoils and a M6 wing are carried out to verify the developed code. The obtained solutions agree well with experimental data and other computational results reported in the literature. Detailed analysis on the performance of the developed code reveals that the developed CPU based implicit meshless method is at least four to eight times faster than its explicit counterpart. The computational efficiency of the implicit method could be further improved by ten to fifteen times on the GPU.
Users manual for the Chameleon parallel programming tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, W.; Smith, B.
1993-06-01
Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highlymore » portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.« less
Optimisation of a parallel ocean general circulation model
NASA Astrophysics Data System (ADS)
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Nadkarni, P M; Miller, P L
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
2012-12-01
identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Quantitative determinations using portable Raman spectroscopy.
Navin, Chelliah V; Tondepu, Chaitanya; Toth, Roxana; Lawson, Latevi S; Rodriguez, Jason D
2017-03-20
A portable Raman spectrometer was used to develop chemometric models to determine percent (%) drug release and potency for 500mg ciprofloxacin HCl tablets. Parallel dissolution and chromatographic experiments were conducted alongside Raman experiments to assess and compare the performance and capabilities of portable Raman instruments in determining critical drug attributes. All batches tested passed the 30min dissolution specification and the Raman model for drug release was able to essentially reproduce the dissolution profiles obtained by ultraviolet spectroscopy at 276nm for all five batches of the 500mg ciprofloxacin tablets. The five batches of 500mg ciprofloxacin tablets also passed the potency (assay) specification and the % label claim for the entire set of tablets run were nearly identical, 99.4±5.1 for the portable Raman method and 99.2±1.2 for the chromatographic method. The results indicate that portable Raman spectrometers can be used to perform quantitative analysis of critical product attributes of finished drug products. The findings of this study indicate that portable Raman may have applications in the areas of process analytical technology and rapid pharmaceutical surveillance. Published by Elsevier B.V.
Element-topology-independent preconditioners for parallel finite element computations
NASA Technical Reports Server (NTRS)
Park, K. C.; Alexander, Scott
1992-01-01
A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
Development and Verification of the Charring, Ablating Thermal Protection Implicit System Simulator
NASA Technical Reports Server (NTRS)
Amar, Adam J.; Calvert, Nathan; Kirk, Benjamin S.
2011-01-01
The development and verification of the Charring Ablating Thermal Protection Implicit System Solver (CATPISS) is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method (FEM) with first and second order fully implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton s method, while the linear system is solved via the Generalized Minimum Residual method (GMRES). Verification results from exact solutions and Method of Manufactured Solutions (MMS) are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.
Nadkarni, P. M.; Miller, P. L.
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K
2017-04-01
Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.
1991-03-01
factor which made TTL-design so powerful was the implicit knowledge that for any object in the TTL Databook, that object’s implementation and...functions as values. Thus, its reasoning power matches the descriptive power of the higher order languages in the previous section. First, the definitions...developing parallel algorithms to better utilize the power of the explicitly parallel programming language constructs. Currently, the methodologies
Toward performance portability of the Albany finite element analysis code using the Kokkos library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...
2018-02-05
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
PRAIS: Distributed, real-time knowledge-based systems made easy
NASA Technical Reports Server (NTRS)
Goldstein, David G.
1990-01-01
This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John
2016-01-01
Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Discrete Diffusion Monte Carlo for Electron Thermal Transport
NASA Astrophysics Data System (ADS)
Chenhall, Jeffrey; Cao, Duc; Wollaeger, Ryan; Moses, Gregory
2014-10-01
The iSNB (implicit Schurtz Nicolai Busquet electron thermal transport method of Cao et al. is adapted to a Discrete Diffusion Monte Carlo (DDMC) solution method for eventual inclusion in a hybrid IMC-DDMC (Implicit Monte Carlo) method. The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the iSNB-DDMC method will be presented. This work was supported by Sandia National Laboratory - Albuquerque.
NASA Astrophysics Data System (ADS)
Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel
2012-10-01
In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.
The BLAZE language: A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shipman, Galen M.
These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Compiled MPI: Cost-Effective Exascale Applications Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Quinlan, D; Lumsdaine, A
2012-04-10
The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Ding, Zhaoxiong; Zhang, Dongying; Wang, Guanghui; Tang, Minghui; Dong, Yumin; Zhang, Yixin; Ho, Ho-Pui; Zhang, Xuping
2016-09-21
In this paper, an in-line, low-cost, miniature and portable spectrophotometric detection system is presented and used for fast protein determination and calibration in centrifugal microfluidics. Our portable detection system is configured with paired emitter and detector diodes (PEDD), where the light beam between both LEDs is collimated with enhanced system tolerance. It is the first time that a physical model of PEDD is clearly presented, which could be modelled as a photosensitive RC oscillator. A portable centrifugal microfluidic system that contains a wireless port in real-time communication with a smartphone has been built to show that PEDD is an effective strategy for conducting rapid protein bioassays with detection performance comparable to that of a UV-vis spectrophotometer. The choice of centrifugal microfluidics offers the unique benefits of highly parallel fluidic actuation at high accuracy while there is no need for a pump, as inertial forces are present within the entire spinning disc and accurately controlled by varying the spinning speed. As a demonstration experiment, we have conducted the Bradford assay for bovine serum albumin (BSA) concentration calibration from 0 to 2 mg mL(-1). Moreover, a novel centrifugal disc with a spiral microchannel is proposed for automatic distribution and metering of the sample to all the parallel reactions at one time. The reported lab-on-a-disc scheme with PEDD detection may offer a solution for high-throughput assays, such as protein density calibration, drug screening and drug solubility measurement that require the handling of a large number of reactions in parallel.
NASA Astrophysics Data System (ADS)
Zhao, Yan; Burger, William R.; Zhou, Mingwei; Pogue, Brian W.; Paulsen, Keith D.; Jiang, Shudong
2017-02-01
A portable, 12-wavelength hybrid frequency domain (FD) and continuous wave (CW) near-infrared spectral tomography (NIRST) system was developed for efficient characterization of breast cancer in a clinical oncology setting. Two sets of three FD and three CW measurements were acquired simultaneously. The imaging time was reduced from 90 to 55 seconds with a new gain adjustment scheme of the optical detector. The study of integrating this system into the workflow of clinical oncology practice is ongoing.
Perceptual other-race training reduces implicit racial bias.
Lebrecht, Sophie; Pierce, Lara J; Tarr, Michael J; Tanaka, James W
2009-01-01
Implicit racial bias denotes socio-cognitive attitudes towards other-race groups that are exempt from conscious awareness. In parallel, other-race faces are more difficult to differentiate relative to own-race faces--the "Other-Race Effect." To examine the relationship between these two biases, we trained Caucasian subjects to better individuate other-race faces and measured implicit racial bias for those faces both before and after training. Two groups of Caucasian subjects were exposed equally to the same African American faces in a training protocol run over 5 sessions. In the individuation condition, subjects learned to discriminate between African American faces. In the categorization condition, subjects learned to categorize faces as African American or not. For both conditions, both pre- and post-training we measured the Other-Race Effect using old-new recognition and implicit racial biases using a novel implicit social measure--the "Affective Lexical Priming Score" (ALPS). Subjects in the individuation condition, but not in the categorization condition, showed improved discrimination of African American faces with training. Concomitantly, subjects in the individuation condition, but not the categorization condition, showed a reduction in their ALPS. Critically, for the individuation condition only, the degree to which an individual subject's ALPS decreased was significantly correlated with the degree of improvement that subject showed in their ability to differentiate African American faces. Our results establish a causal link between the Other-Race Effect and implicit racial bias. We demonstrate that training that ameliorates the perceptual Other-Race Effect also reduces socio-cognitive implicit racial bias. These findings suggest that implicit racial biases are multifaceted, and include malleable perceptual skills that can be modified with relatively little training.
Block Preconditioning to Enable Physics-Compatible Implicit Multifluid Plasma Simulations
NASA Astrophysics Data System (ADS)
Phillips, Edward; Shadid, John; Cyr, Eric; Miller, Sean
2017-10-01
Multifluid plasma simulations involve large systems of partial differential equations in which many time-scales ranging over many orders of magnitude arise. Since the fastest of these time-scales may set a restrictively small time-step limit for explicit methods, the use of implicit or implicit-explicit time integrators can be more tractable for obtaining dynamics at time-scales of interest. Furthermore, to enforce properties such as charge conservation and divergence-free magnetic field, mixed discretizations using volume, nodal, edge-based, and face-based degrees of freedom are often employed in some form. Together with the presence of stiff modes due to integrating over fast time-scales, the mixed discretization makes the required linear solves for implicit methods particularly difficult for black box and monolithic solvers. This work presents a block preconditioning strategy for multifluid plasma systems that segregates the linear system based on discretization type and approximates off-diagonal coupling in block diagonal Schur complement operators. By employing multilevel methods for the block diagonal subsolves, this strategy yields algorithmic and parallel scalability which we demonstrate on a range of problems.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Sierra/Solid Mechanics 4.48 User's Guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merewether, Mark Thomas; Crane, Nathan K; de Frias, Gabriel Jose
Sierra/SolidMechanics (Sierra/SM) is a Lagrangian, three-dimensional code for finite element analysis of solids and structures. It provides capabilities for explicit dynamic, implicit quasistatic and dynamic analyses. The explicit dynamics capabilities allow for the efficient and robust solution of models with extensive contact subjected to large, suddenly applied loads. For implicit problems, Sierra/SM uses a multi-level iterative solver, which enables it to effectively solve problems with large deformations, nonlinear material behavior, and contact. Sierra/SM has a versatile library of continuum and structural elements, and a large library of material models. The code is written for parallel computing environments enabling scalable solutionsmore » of extremely large problems for both implicit and explicit analyses. It is built on the SIERRA Framework, which facilitates coupling with other SIERRA mechanics codes. This document describes the functionality and input syntax for Sierra/SM.« less
Strategic processing in long-term repetition priming in the lexical decision task.
Kessler, Yoav; Moscovitch, Morris
2013-04-01
In a lexical decision task, faster reaction times (RTs) for old than new items is taken as evidence for an implicit memory involvement in this task. In contrast, the present study shows the involvement of both implicit and explicit memory in repetition priming. We propose a dual route model, in which lexical decisions can be made using one of two parallel processing routes: a lexical route, in which the lexical properties of the stimulus are used to determine whether it is a word or not, and a strategic route that builds on the inherent correlation between "wordness" and "oldness" in the experiment. Eliminating the strategic route by removing this correlation diminishes the priming effect at the slow end of the RT distribution, but not at the fast end. This dissociation is interpreted as evidence for the involvement of both implicit and explicit memory in repetition priming.
NASA Astrophysics Data System (ADS)
Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.
2007-07-01
A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.
The BLAZE language - A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Group implicit concurrent algorithms in nonlinear structural dynamics
NASA Technical Reports Server (NTRS)
Ortiz, M.; Sotelino, E. D.
1989-01-01
During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers.
Song, Sunbin; Sharma, Nikhil; Buch, Ethan R.
2012-01-01
We value skills we have learned intentionally, but equally important are skills acquired incidentally without ability to describe how or what is learned, referred to as implicit. Randomized practice schedules are superior to grouped schedules for long-term skill gained intentionally, but its relevance for implicit learning is not known. In a parallel design, we studied healthy subjects who learned a motor sequence implicitly under randomized or grouped practice schedule and obtained diffusion-weighted images to identify white matter microstructural correlates of long-term skill. Randomized practice led to superior long-term skill compared with grouped practice. Whole-brain analyses relating interindividual variability in fractional anisotropy (FA) to long-term skill demonstrated that 1) skill in randomized learners correlated with FA within the corticostriatal tract connecting left sensorimotor cortex to posterior putamen, while 2) skill in grouped learners correlated with FA within the right forceps minor connecting homologous regions of the prefrontal cortex (PFC) and the corticostriatal tract connecting lateral PFC to anterior putamen. These results demonstrate first that randomized practice schedules improve long-term implicit skill more than grouped practice schedules and, second, that the superior skill acquired through randomized practice can be related to white matter microstructure in the sensorimotor corticostriatal network. PMID:21914632
Stefanutti, Luca; Robusto, Egidio; Vianello, Michelangelo; Anselmi, Pasquale
2013-06-01
A formal model is proposed that decomposes the implicit association test (IAT) effect into three process components: stimuli discrimination, automatic association, and termination criterion. Both response accuracy and reaction time are considered. Four independent and parallel Poisson processes, one for each of the four label categories of the IAT, are assumed. The model parameters are the rate at which information accrues on the counter of each process and the amount of information that is needed before a response is given. The aim of this study is to present the model and an illustrative application in which the process components of a Coca-Pepsi IAT are decomposed.
Huang, Chih-Chung; Lee, Po-Yang; Chen, Pay-Yu; Liu, Ting-Yu
2012-01-01
Blood flow measurement using Doppler ultrasound has become a useful tool for diagnosing cardiovascular diseases and as a physiological monitor. Recently, pocket-sized ultrasound scanners have been introduced for portable diagnosis. The present paper reports the implementation of a portable ultrasound pulsed-wave (PW) Doppler flowmeter using a smartphone. A 10-MHz ultrasonic surface transducer was designed for the dynamic monitoring of blood flow velocity. The directional baseband Doppler shift signals were obtained using a portable analog circuit system. After hardware processing, the Doppler signals were fed directly to a smartphone for Doppler spectrogram analysis and display in real time. To the best of our knowledge, this is the first report of the use of this system for medical ultrasound Doppler signal processing. A Couette flow phantom, consisting of two parallel disks with a 2-mm gap, was used to evaluate and calibrate the device. Doppler spectrograms of porcine blood flow were measured using this stand-alone portable device under the pulsatile condition. Subsequently, in vivo portable system verification was performed by measuring the arterial blood flow of a rat and comparing the results with the measurement from a commercial ultrasound duplex scanner. All of the results demonstrated the potential for using a smartphone as a novel embedded system for portable medical ultrasound applications. © 2012 IEEE
Design and Fabrication of Multifunctional Portable Bi2Te3-Based Thermoelectric Camping Lamp
NASA Astrophysics Data System (ADS)
Zhou, Yi; Li, Gongping
2018-05-01
Camping lamps have been widely used in the lighting, power supply, and intelligent electronic equipment fields. However, applications of traditional chemical and solar camping lamps are largely limited by the physical size of the source and operating conditions. A new prototype multifunctional portable Bi2Te3-based thermoelectric camping lamp (TECL) has been designed and fabricated. Ten parallel light-emitting diodes were lit directly by a Bi2Te3-based thermoelectric generator (TEG). The highest short-circuit current of 0.38 A and open-circuit voltage of 4.2 V were obtained at temperature difference of 115 K. This TECL is attractive for use in multifunctional and extreme applications as it integrates a portable heat source, high-performance TEG, and power management unit.
Design and Fabrication of Multifunctional Portable Bi2Te3-Based Thermoelectric Camping Lamp
NASA Astrophysics Data System (ADS)
Zhou, Yi; Li, Gongping
2018-07-01
Camping lamps have been widely used in the lighting, power supply, and intelligent electronic equipment fields. However, applications of traditional chemical and solar camping lamps are largely limited by the physical size of the source and operating conditions. A new prototype multifunctional portable Bi2Te3-based thermoelectric camping lamp (TECL) has been designed and fabricated. Ten parallel light-emitting diodes were lit directly by a Bi2Te3-based thermoelectric generator (TEG). The highest short-circuit current of 0.38 A and open-circuit voltage of 4.2 V were obtained at temperature difference of 115 K. This TECL is attractive for use in multifunctional and extreme applications as it integrates a portable heat source, high-performance TEG, and power management unit.
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less
Polymerase chain reaction system
Benett, William J.; Richards, James B.; Stratton, Paul L.; Hadley, Dean R.; Milanovich, Fred P.; Belgrader, Phil; Meyer, Peter L.
2004-03-02
A portable polymerase chain reaction DNA amplification and detection system includes one or more chamber modules. Each module supports a duplex assay of a biological sample. Each module has two parallel interrogation ports with a linear optical system. The system is capable of being handheld.
Modeling Subsurface Reactive Flows Using Leadership-Class Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mills, Richard T; Hammond, Glenn; Lichtner, Peter
2009-01-01
We describe our experiences running PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - on leadership-class supercomputers, including initial experiences running on the petaflop incarnation of Jaguar, the Cray XT5 at the National Center for Computational Sciences at Oak Ridge National Laboratory. PFLOTRAN utilizes fully implicit time-stepping and is built on top of the Portable, Extensible Toolkit for Scientific Computation (PETSc). We discuss some of the hurdles to 'at scale' performance with PFLOTRAN and the progress we have made in overcoming them on leadership-class computer architectures.
The science of computing - The evolution of parallel processing
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Bilingual parallel programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.; Overbeek, R.
1990-01-01
Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Targeting multiple heterogeneous hardware platforms with OpenCL
NASA Astrophysics Data System (ADS)
Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.
2014-06-01
The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
Can a continuum solvent model reproduce the free energy landscape of a -hairpin folding in water?
NASA Astrophysics Data System (ADS)
Zhou, Ruhong; Berne, Bruce J.
2002-10-01
The folding free energy landscape of the C-terminal -hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the -hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native -strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this -hairpin. Furthermore, the -hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and 80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields.
Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?
Zhou, Ruhong; Berne, Bruce J.
2002-01-01
The folding free energy landscape of the C-terminal β-hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the β-hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native β-strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this β-hairpin. Furthermore, the β-hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and ≈80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields. PMID:12242327
Zhou, Ruhong; Berne, Bruce J
2002-10-01
The folding free energy landscape of the C-terminal beta-hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the beta-hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native beta-strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this beta-hairpin. Furthermore, the beta-hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and approximately equal 80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields.
Semantic Language Extensions for Implicit Parallel Programming
2013-09-01
mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend
NASA Astrophysics Data System (ADS)
Caplan, R. M.; Mikić, Z.; Linker, J. A.; Lionello, R.
2017-05-01
We explore the performance and advantages/disadvantages of using unconditionally stable explicit super time-stepping (STS) algorithms versus implicit schemes with Krylov solvers for integrating parabolic operators in thermodynamic MHD models of the solar corona. Specifically, we compare the second-order Runge-Kutta Legendre (RKL2) STS method with the implicit backward Euler scheme computed using the preconditioned conjugate gradient (PCG) solver with both a point-Jacobi and a non-overlapping domain decomposition ILU0 preconditioner. The algorithms are used to integrate anisotropic Spitzer thermal conduction and artificial kinematic viscosity at time-steps much larger than classic explicit stability criteria allow. A key component of the comparison is the use of an established MHD model (MAS) to compute a real-world simulation on a large HPC cluster. Special attention is placed on the parallel scaling of the algorithms. It is shown that, for a specific problem and model, the RKL2 method is comparable or surpasses the implicit method with PCG solvers in performance and scaling, but suffers from some accuracy limitations. These limitations, and the applicability of RKL methods are briefly discussed.
Mapping implicit spectral methods to distributed memory architectures
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Vanrosendale, John
1991-01-01
Spectral methods were proven invaluable in numerical simulation of PDEs (Partial Differential Equations), but the frequent global communication required raises a fundamental barrier to their use on highly parallel architectures. To explore this issue, a 3-D implicit spectral method was implemented on an Intel hypercube. Utilization of about 50 percent was achieved on a 32 node iPSC/860 hypercube, for a 64 x 64 x 64 Fourier-spectral grid; finer grids yield higher utilizations. Chebyshev-spectral grids are more problematic, since plane-relaxation based multigrid is required. However, by using a semicoarsening multigrid algorithm, and by relaxing all multigrid levels concurrently, relatively high utilizations were also achieved in this harder case.
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics
NASA Astrophysics Data System (ADS)
Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.
2017-05-01
We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Implicit Coupling Approach for Simulation of Charring Carbon Ablators
NASA Technical Reports Server (NTRS)
Chen, Yih-Kanq; Gokcen, Tahir
2013-01-01
This study demonstrates that coupling of a material thermal response code and a flow solver with nonequilibrium gas/surface interaction for simulation of charring carbon ablators can be performed using an implicit approach. The material thermal response code used in this study is the three-dimensional version of Fully Implicit Ablation and Thermal response program, which predicts charring material thermal response and shape change on hypersonic space vehicles. The flow code solves the reacting Navier-Stokes equations using Data Parallel Line Relaxation method. Coupling between the material response and flow codes is performed by solving the surface mass balance in flow solver and the surface energy balance in material response code. Thus, the material surface recession is predicted in flow code, and the surface temperature and pyrolysis gas injection rate are computed in material response code. It is demonstrated that the time-lagged explicit approach is sufficient for simulations at low surface heating conditions, in which the surface ablation rate is not a strong function of the surface temperature. At elevated surface heating conditions, the implicit approach has to be taken, because the carbon ablation rate becomes a stiff function of the surface temperature, and thus the explicit approach appears to be inappropriate resulting in severe numerical oscillations of predicted surface temperature. Implicit coupling for simulation of arc-jet models is performed, and the predictions are compared with measured data. Implicit coupling for trajectory based simulation of Stardust fore-body heat shield is also conducted. The predicted stagnation point total recession is compared with that predicted using the chemical equilibrium surface assumption
A high-speed linear algebra library with automatic parallelism
NASA Technical Reports Server (NTRS)
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Independence between implicit and explicit processing as revealed by the Simon effect.
Lo, Shih-Yu; Yeh, Su-Ling
2011-09-01
Studies showing human behavior influenced by subliminal stimuli mainly focus on implicit processing per se, and little is known about its interaction with explicit processing. We examined this by using the Simon effect, wherein a task-irrelevant spatial distracter interferes with lateralized response. Lo and Yeh (2008) found that the visual Simon effect, although it occurred when participants were aware of the visual distracters, did not occur with subliminal visual distracters. We used the same paradigm and examined whether subliminal and supra-threshold stimuli are processed independently by adding a supra-threshold auditory distracter to ascertain whether it would interact with the subliminal visual distracter. Results showed auditory Simon effect, but there was still no visual Simon effect, indicating that supra-threshold and subliminal stimuli are processed separately in independent streams. In contrast to the traditional view that implicit processing precedes explicit processing, our results suggest that they operate independently in a parallel fashion. Copyright © 2010 Elsevier Inc. All rights reserved.
Towards full-Braginskii implicit extended MHD
NASA Astrophysics Data System (ADS)
Chacon, Luis
2009-05-01
Recently, viable algorithms have been proposed for the scalable, fully-implicit temporal integration of 3D resistive MHD and cold-ion extended MHD models. While significant, these achievements must be tempered by the fact that such models lack predictive capabilities in regimes of interest for magnetic fusion. Short of including kinetic closures, a natural evolution path towards predictability starts by considering additional terms as described in Braginskii's fluid closures in the collisional regime. Here, we focus on the inclusion of two fundamental elements of relevance for fusion plasmas: anisotropic parallel electron transport, and warm-ion physics (i.e., ion finite Larmor radius effects, included via gyroviscosity). Both these elements introduce significant numerical difficulties, due to the strong anisotropy in the former, and the presence of dispersive waves in the latter. In this presentation, we will discuss progress in our fully implicit algorithmic formulation towards the inclusion of both these elements. L. Chac'on, Phys. Plasmas, 15, 056103 (2008) L. Chac'on, J. Physics: Conf. Series, 125, 012041 (2008)
A low-cost and portable realization on fringe projection three-dimensional measurement
NASA Astrophysics Data System (ADS)
Xiao, Suzhi; Tao, Wei; Zhao, Hui
2015-12-01
Fringe projection three-dimensional measurement is widely applied in a wide range of industrial application. The traditional fringe projection system has the disadvantages of high expense, big size, and complicated calibration requirements. In this paper we introduce a low-cost and portable realization on three-dimensional measurement with Pico projector. It has the advantages of low cost, compact physical size, and flexible configuration. For the proposed fringe projection system, there is no restriction to camera and projector's relative alignment on parallelism and perpendicularity for installation. Moreover, plane-based calibration method is adopted in this paper that avoids critical requirements on calibration system such as additional gauge block or precise linear z stage. What is more, error sources existing in the proposed system are introduced in this paper. The experimental results demonstrate the feasibility of the proposed low cost and portable fringe projection system.
Comparison of soil pollution concentrations determined using AAS and portable XRF techniques.
Radu, Tanja; Diamond, Dermot
2009-11-15
Past mining activities in the area of Silvermines, Ireland, have resulted in heavily polluted soils. The possibility of spreading pollution to the surrounding areas through dust blow-offs poses a potential threat for the local communities. Conventional environmental soil and dust analysis techniques are very slow and laborious and consequently there is a need for fast and accurate analytical methods, which can provide real-time in situ pollution mapping. Laboratory-based aqua regia acid digestion of the soil samples collected in the area followed by the atomic absorption spectrophotometry (AAS) analysis confirmed very high pollution, especially by Pb, As, Cu, and Zn. In parallel, samples were analyzed using portable X-ray fluorescence radioisotope and miniature tube powered (XRF) NITON instruments and their performance was compared. Overall, the portable XRF instrument gave excellent correlation with the laboratory-based reference AAS method.
Iran's Implicit Philosophy of Education
ERIC Educational Resources Information Center
Bagheri Noaparast, Khosrow
2018-01-01
This paper aims to extract Iran's philosophy of education from two sources of the constitution and the course of practice in educational institutions. Regarding the first source, it is argued that parallel to the two main threads of the constitution, Iran's main elements of philosophy of education are expected to be derived from; (1) Islam and (2)…
USDA-ARS?s Scientific Manuscript database
AgroEcoSystem-Watershed (AgES-W) is a modular, Java-based spatially distributed model which implements hydrologic and water quality (H/WQ) simulation components under the Java Connection Framework (JCF) and the Object Modeling System (OMS) environmental modeling framework. AgES-W is implicitly scala...
Sleep Benefits in Parallel Implicit and Explicit Measures of Episodic Memory
ERIC Educational Resources Information Center
Weber, Frederik D.; Wang, Jing-Yi; Born, Jan; Inostroza, Marion
2014-01-01
Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual…
rfpipe: Radio interferometric transient search pipeline
NASA Astrophysics Data System (ADS)
Law, Casey J.
2017-10-01
rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.
An interactive parallel programming environment applied in atmospheric science
NASA Technical Reports Server (NTRS)
vonLaszewski, G.
1996-01-01
This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Portable instrument for inspecting irradiated nuclear fuel assemblies
Nicholson, Nicholas; Dowdy, Edward J.; Holt, David M.; Stump, Jr., Charles J.
1985-01-01
A portable instrument for measuring induced Cerenkov radiation associated with irradiated nuclear fuel assemblies in a water-filled storage pond is disclosed. The instrument includes a photomultiplier tube and an image intensifier which are operable in parallel and simultaneously by means of a field lens assembly and an associated beam splitter. The image intensifier permits an operator to aim and focus the apparatus on a submerged fuel assembly. Once the instrument is aimed and focused, an illumination reading can be obtained with the photomultiplier tube. The instrument includes a lens cap with a carbon-14/phosphor light source for calibrating the apparatus in the field.
Advances in developing rapid, reliable and portable detection systems for alcohol.
Thungon, Phurpa Dema; Kakoti, Ankana; Ngashangva, Lightson; Goswami, Pranab
2017-11-15
Development of portable, reliable, sensitive, simple, and inexpensive detection system for alcohol has been an instinctive demand not only in traditional brewing, pharmaceutical, food and clinical industries but also in rapidly growing alcohol based fuel industries. Highly sensitive, selective, and reliable alcohol detections are currently amenable typically through the sophisticated instrument based analyses confined mostly to the state-of-art analytical laboratory facilities. With the growing demand of rapid and reliable alcohol detection systems, an all-round attempt has been made over the past decade encompassing various disciplines from basic and engineering sciences. Of late, the research for developing small-scale portable alcohol detection system has been accelerated with the advent of emerging miniaturization techniques, advanced materials and sensing platforms such as lab-on-chip, lab-on-CD, lab-on-paper etc. With these new inter-disciplinary approaches along with the support from the parallel knowledge growth on rapid detection systems being pursued for various targets, the progress on translating the proof-of-concepts to commercially viable and environment friendly portable alcohol detection systems is gaining pace. Here, we summarize the progress made over the years on the alcohol detection systems, with a focus on recent advancement towards developing portable, simple and efficient alcohol sensors. Copyright © 2017 Elsevier B.V. All rights reserved.
Computing NLTE Opacities -- Node Level Parallel Calculation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holladay, Daniel
Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, T.
Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)
1997-01-01
In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
NASA Astrophysics Data System (ADS)
Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar
2017-12-01
We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Scalable Implementation of Finite Elements by NASA _ Implicit (ScIFEi)
NASA Technical Reports Server (NTRS)
Warner, James E.; Bomarito, Geoffrey F.; Heber, Gerd; Hochhalter, Jacob D.
2016-01-01
Scalable Implementation of Finite Elements by NASA (ScIFEN) is a parallel finite element analysis code written in C++. ScIFEN is designed to provide scalable solutions to computational mechanics problems. It supports a variety of finite element types, nonlinear material models, and boundary conditions. This report provides an overview of ScIFEi (\\Sci-Fi"), the implicit solid mechanics driver within ScIFEN. A description of ScIFEi's capabilities is provided, including an overview of the tools and features that accompany the software as well as a description of the input and output le formats. Results from several problems are included, demonstrating the efficiency and scalability of ScIFEi by comparing to finite element analysis using a commercial code.
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ye; Ma, Xiaosong; Liu, Qing Gary
2015-01-01
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less
MPI-IO: A Parallel File I/O Interface for MPI Version 0.3
NASA Technical Reports Server (NTRS)
Corbett, Peter; Feitelson, Dror; Hsu, Yarsun; Prost, Jean-Pierre; Snir, Marc; Fineberg, Sam; Nitzberg, Bill; Traversat, Bernard; Wong, Parkson
1995-01-01
Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions.
Multigrid treatment of implicit continuum diffusion
NASA Astrophysics Data System (ADS)
Francisquez, Manaure; Zhu, Ben; Rogers, Barrett
2017-10-01
Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems
NASA Astrophysics Data System (ADS)
Ierotheou, C. S.; Forsey, C. R.; Leatham, M.
2000-04-01
The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Methodology of modeling and measuring computer architectures for plasma simulations
NASA Technical Reports Server (NTRS)
Wang, L. P. T.
1977-01-01
A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
ERIC Educational Resources Information Center
Jaakkola, Tomi; Nurmi, Sami; Veermans, Koen
2011-01-01
The aim of this experimental study was to compare learning outcomes of students using a simulation alone (simulation environment) with outcomes of those using a simulation in parallel with real circuits (combination environment) in the domain of electricity, and to explore how learning outcomes in these environments are mediated by implicit (only…
Multiobjective Multifactorial Optimization in Evolutionary Multitasking.
Gupta, Abhishek; Ong, Yew-Soon; Feng, Liang; Tan, Kay Chen
2016-05-03
In recent decades, the field of multiobjective optimization has attracted considerable interest among evolutionary computation researchers. One of the main features that makes evolutionary methods particularly appealing for multiobjective problems is the implicit parallelism offered by a population, which enables simultaneous convergence toward the entire Pareto front. While a plethora of related algorithms have been proposed till date, a common attribute among them is that they focus on efficiently solving only a single optimization problem at a time. Despite the known power of implicit parallelism, seldom has an attempt been made to multitask, i.e., to solve multiple optimization problems simultaneously. It is contended that the notion of evolutionary multitasking leads to the possibility of automated transfer of information across different optimization exercises that may share underlying similarities, thereby facilitating improved convergence characteristics. In particular, the potential for automated transfer is deemed invaluable from the standpoint of engineering design exercises where manual knowledge adaptation and reuse are routine. Accordingly, in this paper, we present a realization of the evolutionary multitasking paradigm within the domain of multiobjective optimization. The efficacy of the associated evolutionary algorithm is demonstrated on some benchmark test functions as well as on a real-world manufacturing process design problem from the composites industry.
Efficiency and flexibility using implicit methods within atmosphere dycores
NASA Astrophysics Data System (ADS)
Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.
2016-12-01
A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
On extending parallelism to serial simulators
NASA Technical Reports Server (NTRS)
Nicol, David; Heidelberger, Philip
1994-01-01
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
NASA Astrophysics Data System (ADS)
Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard
2017-03-01
Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers
NASA Technical Reports Server (NTRS)
Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.
2014-01-01
This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Nicholson, N.; Dowdy, E.J.; Holt, D.M.; Stump, C.J. Jr.
1982-05-13
A portable instrument for measuring induced Cerenkov radiation associated with irradiated nuclear fuel assemblies in a water-filled storage pond is disclosed. The instrument includes a photomultiplier tube and an image intensifier which are operable in parallel and simultaneously by means of a field lens assembly and an associated beam splitter. The image intensifier permits an operator to aim and focus the apparatus on a submerged fuel assembly. Once the instrument is aimed and focused, an illumination reading can be obtained with the photomultiplier tube. The instrument includes a lens cap with a carbon-14/phosphor light source for calibrating the apparatus in the field.
NASA Astrophysics Data System (ADS)
Xia, Yidong
The objective this work is to develop a parallel, implicit reconstructed discontinuous Galerkin (RDG) method using Taylor basis for the solution of the compressible Navier-Stokes equations on 3D hybrid grids. This third-order accurate RDG method is based on a hierarchical weighed essentially non- oscillatory reconstruction scheme, termed as HWENO(P1P 2) to indicate that a quadratic polynomial solution is obtained from the underlying linear polynomial DG solution via a hierarchical WENO reconstruction. The HWENO(P1P2) is designed not only to enhance the accuracy of the underlying DG(P1) method but also to ensure non-linear stability of the RDG method. In this reconstruction scheme, a quadratic polynomial (P2) solution is first reconstructed using a least-squares approach from the underlying linear (P1) discontinuous Galerkin solution. The final quadratic solution is then obtained using a Hermite WENO reconstruction, which is necessary to ensure the linear stability of the RDG method on 3D unstructured grids. The first derivatives of the quadratic polynomial solution are then reconstructed using a WENO reconstruction in order to eliminate spurious oscillations in the vicinity of strong discontinuities, thus ensuring the non-linear stability of the RDG method. The parallelization in the RDG method is based on a message passing interface (MPI) programming paradigm, where the METIS library is used for the partitioning of a mesh into subdomain meshes of approximately the same size. Both multi-stage explicit Runge-Kutta and simple implicit backward Euler methods are implemented for time advancement in the RDG method. In the implicit method, three approaches: analytical differentiation, divided differencing (DD), and automatic differentiation (AD) are developed and implemented to obtain the resulting flux Jacobian matrices. The automatic differentiation is a set of techniques based on the mechanical application of the chain rule to obtain derivatives of a function given as a computer program. By using an AD tool, the manpower can be significantly reduced for deriving the flux Jacobians, which can be quite complicated, tedious, and error-prone if done by hand or symbolic arithmetic software, depending on the complexity of the numerical flux scheme. In addition, the workload for code maintenance can also be largely reduced in case the underlying flux scheme is updated. The approximate system of linear equations arising from the Newton linearization is solved by the general minimum residual (GMRES) algorithm with lower-upper symmetric gauss-seidel (LUSGS) preconditioning. This GMRES+LU-SGS linear solver is the most robust and efficient for implicit time integration of the discretized Navier-Stokes equations when the AD-based flux Jacobians are provided other than the other two approaches. The developed HWENO(P1P2) method is used to compute a variety of well-documented compressible inviscid and viscous flow test cases on 3D hybrid grids, including some standard benchmark test cases such as the Sod shock tube, flow past a circular cylinder, and laminar flow past a at plate. The computed solutions are compared with either analytical solutions or experimental data, if available to assess the accuracy of the HWENO(P 1P2) method. Numerical results demonstrate that the HWENO(P 1P2) method is able to not only enhance the accuracy of the underlying HWENO(P1) method, but also ensure the linear and non-linear stability at the presence of strong discontinuities. An extensive study of grid convergence analysis on various types of elements: tetrahedron, prism, hexahedron, and hybrid prism/hexahedron, for a number of test cases indicates that the developed HWENO(P1P2) method is able to achieve the designed third-order accuracy of spatial convergence for smooth inviscid flows: one order higher than the underlying second-order DG(P1) method without significant increase in computing costs and storage requirements. The performance of the the developed GMRES+LU-SGS implicit method is compared with the multi-stage Runge-Kutta time stepping scheme for a number of test cases in terms of the timestep and CPU time. Numerical results indicate that the overall performance of the implicit method with AD-based Jacobians is order of magnitude better than the its explicit counterpart. Finally, a set of parallel scaling tests for both explicit and implicit methods is conducted on North Carolina State University's ARC cluster, demonstrating almost an ideal scalability of the RDG method. (Abstract shortened by UMI.)
Bastos, Francisco Inácio
2010-03-01
The philosopher Paul Feyerabend and Brazilian scientists Maurício da Rocha e Silva and Newton Freire-Maia were contemporaries and lived surrounded by the fundamental dilemnas of science. The anarchist proposal of Feyerabend, then embryonic, was formulated in parallel by Rocha e Silva in his criticism of the scientific method. Two decades later, Feyerabend's ideas seemed implicitly to stimulate Newton Freire-Maia in his reflections on science. The web of interrelationships in the ideas of these three men - who never interacted - touches on central issues for Brazilian science from 1960 to 1980, a period in which the latter is consolidated in a dialogue with the nascent reflection on science and the scientific method in Brazil.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Sittig, D. F.; Orr, J. A.
1991-01-01
Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607
NASA Technical Reports Server (NTRS)
Goldstein, David
1991-01-01
Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Sleep benefits in parallel implicit and explicit measures of episodic memory.
Weber, Frederik D; Wang, Jing-Yi; Born, Jan; Inostroza, Marion
2014-03-14
Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual exploration and verbal recall as, respectively, implicit and explicit measures of memory, to study effects of sleep on episodic memory consolidation in humans. During encoding before 10-h retention intervals that covered nighttime sleep or daytime wakefulness, two groups of young adults were presented with two episodes that were 1-h apart. Each episode entailed a spatial configuration of four different faces in a 3 × 3 grid of locations. After the retention interval, implicit spatio-temporal recall performance was assessed by eye-tracking visual exploration of another configuration of four faces of which two were from the first and second episode, respectively; of the two faces one was presented at the same location as during encoding and the other at another location. Afterward explicit verbal recall was assessed. Measures of implicit and explicit episodic memory retention were positively correlated (r = 0.57, P < 0.01), and were both better after nighttime sleep than daytime wakefulness (P < 0.05). In the sleep group, implicit episodic memory recall was associated with increased fast spindles during nonrapid eye movement (NonREM) sleep (r = 0.62, P < 0.05). Together with concordant observations in rats our results indicate that consolidation of genuinely episodic memory benefits from sleep.
Sleep benefits in parallel implicit and explicit measures of episodic memory
Weber, Frederik D.; Wang, Jing-Yi; Born, Jan; Inostroza, Marion
2014-01-01
Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual exploration and verbal recall as, respectively, implicit and explicit measures of memory, to study effects of sleep on episodic memory consolidation in humans. During encoding before 10-h retention intervals that covered nighttime sleep or daytime wakefulness, two groups of young adults were presented with two episodes that were 1-h apart. Each episode entailed a spatial configuration of four different faces in a 3 × 3 grid of locations. After the retention interval, implicit spatio-temporal recall performance was assessed by eye-tracking visual exploration of another configuration of four faces of which two were from the first and second episode, respectively; of the two faces one was presented at the same location as during encoding and the other at another location. Afterward explicit verbal recall was assessed. Measures of implicit and explicit episodic memory retention were positively correlated (r = 0.57, P < 0.01), and were both better after nighttime sleep than daytime wakefulness (P < 0.05). In the sleep group, implicit episodic memory recall was associated with increased fast spindles during nonrapid eye movement (NonREM) sleep (r = 0.62, P < 0.05). Together with concordant observations in rats our results indicate that consolidation of genuinely episodic memory benefits from sleep. PMID:24634354
The Growth of m-Learning and the Growth of Mobile Computing: Parallel Developments
ERIC Educational Resources Information Center
Caudill, Jason G.
2007-01-01
m-Learning is made possible by the existence and application of mobile hardware and networking technology. By exploring the capabilities of these technologies, it is possible to construct a picture of how different components of m-Learning can be implemented. This paper will explore the major technologies currently in use: portable digital…
USDA-ARS?s Scientific Manuscript database
The use of field effect transistors (FETs) as the transduction element for the detection of DNA amplification reactions will enable portable and inexpensive nucleic acid analysis. Transistors used as biological sensors,or BioFETs, minimize the cost and size of detection platforms by leveraging fabri...
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
NASA Astrophysics Data System (ADS)
Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.
2016-10-01
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.
Zhang, Mengliang; Kruse, Natalie A; Bowman, Jennifer R; Jackson, Glen P
2016-05-01
An expedited field analysis method was developed for the determination of polychlorinated biphenyls (PCBs) in soil matrices using a portable gas chromatography-mass spectrometry (GC-MS) instrument. Soil samples of approximately 0.5 g were measured with a portable scale and PCBs were extracted by headspace solid-phase microextraction (SPME) with a 100 µm polydimethylsiloxane (PDMS) fiber. Two milliliters of 0.2 M potassium permanganate and 0.5 mL of 6 M sulfuric acid solution were added to the soil matrices to facilitate the extraction of PCBs. The extraction was performed for 30 min at 100 ℃ in a portable heating block that was powered by a portable generator. The portable GC-MS instrument took less than 6 min per analysis and ran off an internal battery and helium cylinder. Six commercial PCB mixtures, Aroclor 1016, 1221, 1232, 1242, 1248, 1254, and 1260, could be classified based on the GC chromatograms and mass spectra. The detection limit of this method for Aroclor 1260 in soil matrices is approximately 10 ppm, which is sufficient for guiding remediation efforts in contaminated sites. This method was applicable to the on-site analysis of PCBs with a total analysis time of 37 min per sample. However, the total analysis time could be improved to less than 7 min per sample by conducting the rate-limiting extraction step for different samples in parallel. © The Author(s) 2016.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.
Higginson, J S; Neptune, R R; Anderson, F C
2005-09-01
Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
A convenient and accurate parallel Input/Output USB device for E-Prime.
Canto, Rosario; Bufalari, Ilaria; D'Ausilio, Alessandro
2011-03-01
Psychological and neurophysiological experiments require the accurate control of timing and synchrony for Input/Output signals. For instance, a typical Event-Related Potential (ERP) study requires an extremely accurate synchronization of stimulus delivery with recordings. This is typically done via computer software such as E-Prime, and fast communications are typically assured by the Parallel Port (PP). However, the PP is an old and disappearing technology that, for example, is no longer available on portable computers. Here we propose a convenient USB device enabling parallel I/O capabilities. We tested this device against the PP on both a desktop and a laptop machine in different stress tests. Our data demonstrate the accuracy of our system, which suggests that it may be a good substitute for the PP with E-Prime.
A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation
NASA Astrophysics Data System (ADS)
Han, Wang; Lin, Tan
2016-02-01
This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.
Massively parallel quantum computer simulator
NASA Astrophysics Data System (ADS)
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele
2018-03-01
Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures
NASA Technical Reports Server (NTRS)
Cohl, H.
1994-01-01
We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.
NASA Astrophysics Data System (ADS)
Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.
2010-12-01
Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
2011-09-01
optimized building blocks such as a parallelized tri-diagonal linear solver (used in the “implicit finite differences ” and split-step Pade PE models...and Ding Lee. “A finite - difference treatment of interface conditions for the parabolic wave equation: The horizontal interface.” The Journal of the...Acoustical Society of America, 71(4):855, 1982. 3. Ding Lee and Suzanne T. McDaniel. “A finite - difference treatment of interface conditions for
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Yan, Jerry
1999-01-01
We present an HPF (High Performance Fortran) implementation of ARC3D code along with the profiling and performance data on SGI Origin 2000. Advantages and limitations of HPF as a parallel programming language for CFD applications are discussed. For achieving good performance results we used the data distributions optimized for implementation of implicit and explicit operators of the solver and boundary conditions. We compare the results with MPI and directive based implementations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mao, Shuai; Hu, Peng-Cheng, E-mail: hupc@hit.edu.cn; Ding, Xue-Mei, E-mail: X.M.Ding@outlook.com
A fiber-coupled displacement measuring interferometer capable of determining of the posture of a reflective surface of a measuring mirror is proposed. The newly constructed instrument combines fiber-coupled displacement and angular measurement technologies. The proposed interferometer has advantages of both the fiber-coupled and the spatially beam-separated interferometer. A portable dual-position sensitive detector (PSD)-based unit within this proposed interferometer measures the parallelism of the two source beams to guide the fiber-coupling adjustment. The portable dual PSD-based unit measures not only the pitch and yaw of the retro-reflector but also measures the posture of the reflective surface. The experimental results of displacement calibrationmore » show that the deviations between the proposed interferometer and a reference one, Agilent 5530, at two different common beam directions are both less than ±35 nm, thus verifying the effectiveness of the beam parallelism measurement. The experimental results of angular calibration show that deviations of pitch and yaw with the auto-collimator (as a reference) are less than ±2 arc sec, thus proving the proposed interferometer’s effectiveness for determination of the posture of a reflective surface.« less
NASA Astrophysics Data System (ADS)
Ji, X.; Shen, C.
2017-12-01
Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit
NASA Technical Reports Server (NTRS)
Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;
1998-01-01
Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
NASA Technical Reports Server (NTRS)
Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana
2016-01-01
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
A scalable, fully implicit algorithm for the reduced two-field low-β extended MHD model
Chacon, Luis; Stanier, Adam John
2016-12-01
Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
2014-05-01
heating prediction to grid alignment along the shock . . . . . . . . 36 1-12 Large variation in heating predictions for 3D hypersonic flow over cylinder...100 4-12 Taylor Vortex problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4-13 Taylor Vortex problem: 3D ...149 6-16 3D contours for temperature, T for MIG and US3D for only O2 test case . . . . 150 6-17 Stagnation line plots for only
PETSc Users Manual Revision 3.7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, Satish; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
PETSc Users Manual Revision 3.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.
Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano
2014-09-09
A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Smitha, K G; Vinod, A P
2015-11-01
Children with autism spectrum disorder have difficulty in understanding the emotional and mental states from the facial expressions of the people they interact. The inability to understand other people's emotions will hinder their interpersonal communication. Though many facial emotion recognition algorithms have been proposed in the literature, they are mainly intended for processing by a personal computer, which limits their usability in on-the-move applications where portability is desired. The portability of the system will ensure ease of use and real-time emotion recognition and that will aid for immediate feedback while communicating with caretakers. Principal component analysis (PCA) has been identified as the least complex feature extraction algorithm to be implemented in hardware. In this paper, we present a detailed study of the implementation of serial and parallel implementation of PCA in order to identify the most feasible method for realization of a portable emotion detector for autistic children. The proposed emotion recognizer architectures are implemented on Virtex 7 XC7VX330T FFG1761-3 FPGA. We achieved 82.3% detection accuracy for a word length of 8 bits.
A low cost, disposable cable-shaped Al-air battery for portable biosensors
NASA Astrophysics Data System (ADS)
Fotouhi, Gareth; Ogier, Caleb; Kim, Jong-Hoon; Kim, Sooyeun; Cao, Guozhong; Shen, Amy Q.; Kramlich, John; Chung, Jae-Hyun
2016-05-01
A disposable cable-shaped flexible battery is presented using a simple, low cost manufacturing process. The working principle of an aluminum-air galvanic cell is used for the cable-shaped battery to power portable and point-of-care medical devices. The battery is catalyzed with a carbon nanotube (CNT)-paper matrix. A scalable manufacturing process using a lathe is developed to wrap a paper layer and a CNT-paper matrix on an aluminum wire. The matrix is then wrapped with a silver-plated copper wire to form the battery cell. The battery is activated through absorption of electrolytes including phosphate-buffered saline, NaOH, urine, saliva, and blood into the CNT-paper matrix. The maximum electric power using a 10 mm-long battery cell is over 1.5 mW. As a demonstration, an LED is powered using two groups of four batteries in parallel connected in series. Considering the material composition and the cable-shaped configuration, the battery is fully disposable, flexible, and potentially compatible with portable biosensors through activation by either reagents or biological fluids.
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
Global magnetosphere simulations using constrained-transport Hall-MHD with CWENO reconstruction
NASA Astrophysics Data System (ADS)
Lin, L.; Germaschewski, K.; Maynard, K. M.; Abbott, S.; Bhattacharjee, A.; Raeder, J.
2013-12-01
We present a new CWENO (Centrally-Weighted Essentially Non-Oscillatory) reconstruction based MHD solver for the OpenGGCM global magnetosphere code. The solver was built using libMRC, a library for creating efficient parallel PDE solvers on structured grids. The use of libMRC gives us access to its core functionality of providing an automated code generation framework which takes a user provided PDE right hand side in symbolic form to generate an efficient, computer architecture specific, parallel code. libMRC also supports block-structured adaptive mesh refinement and implicit-time stepping through integration with the PETSc library. We validate the new CWENO Hall-MHD solver against existing solvers both in standard test problems as well as in global magnetosphere simulations.
Highly Parallel Alternating Directions Algorithm for Time Dependent Problems
NASA Astrophysics Data System (ADS)
Ganzha, M.; Georgiev, K.; Lirkov, I.; Margenov, S.; Paprzycki, M.
2011-11-01
In our work, we consider the time dependent Stokes equation on a finite time interval and on a uniform rectangular mesh, written in terms of velocity and pressure. For this problem, a parallel algorithm based on a novel direction splitting approach is developed. Here, the pressure equation is derived from a perturbed form of the continuity equation, in which the incompressibility constraint is penalized in a negative norm induced by the direction splitting. The scheme used in the algorithm is composed of two parts: (i) velocity prediction, and (ii) pressure correction. This is a Crank-Nicolson-type two-stage time integration scheme for two and three dimensional parabolic problems in which the second-order derivative, with respect to each space variable, is treated implicitly while the other variable is made explicit at each time sub-step. In order to achieve a good parallel performance the solution of the Poison problem for the pressure correction is replaced by solving a sequence of one-dimensional second order elliptic boundary value problems in each spatial direction. The parallel code is implemented using the standard MPI functions and tested on two modern parallel computer systems. The performed numerical tests demonstrate good level of parallel efficiency and scalability of the studied direction-splitting-based algorithm.
Dual-processing accounts of reasoning, judgment, and social cognition.
Evans, Jonathan St B T
2008-01-01
This article reviews a diverse set of proposals for dual processing in higher cognition within largely disconnected literatures in cognitive and social psychology. All these theories have in common the distinction between cognitive processes that are fast, automatic, and unconscious and those that are slow, deliberative, and conscious. A number of authors have recently suggested that there may be two architecturally (and evolutionarily) distinct cognitive systems underlying these dual-process accounts. However, it emerges that (a) there are multiple kinds of implicit processes described by different theorists and (b) not all of the proposed attributes of the two kinds of processing can be sensibly mapped on to two systems as currently conceived. It is suggested that while some dual-process theories are concerned with parallel competing processes involving explicit and implicit knowledge systems, others are concerned with the influence of preconscious processes that contextualize and shape deliberative reasoning and decision-making.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
S-HARP: A parallel dynamic spectral partitioner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sohn, A.; Simon, H.
1998-01-01
Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...
2016-01-26
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Application of the θ-method to a telegraphic model of fluid flow in a dual-porosity medium
NASA Astrophysics Data System (ADS)
González-Calderón, Alfredo; Vivas-Cruz, Luis X.; Herrera-Hernández, Erik César
2018-01-01
This work focuses mainly on the study of numerical solutions, which are obtained using the θ-method, of a generalized Warren and Root model that includes a second-order wave-like equation in its formulation. The solutions approximately describe the single-phase hydraulic head in fractures by considering the finite velocity of propagation by means of a Cattaneo-like equation. The corresponding discretized model is obtained by utilizing a non-uniform grid and a non-uniform time step. A simple relationship is proposed to give the time-step distribution. Convergence is analyzed by comparing results from explicit, fully implicit, and Crank-Nicolson schemes with exact solutions: a telegraphic model of fluid flow in a single-porosity reservoir with relaxation dynamics, the Warren and Root model, and our studied model, which is solved with the inverse Laplace transform. We find that the flux and the hydraulic head have spurious oscillations that most often appear in small-time solutions but are attenuated as the solution time progresses. Furthermore, we show that the finite difference method is unable to reproduce the exact flux at time zero. Obtaining results for oilfield production times, which are in the order of months in real units, is only feasible using parallel implicit schemes. In addition, we propose simple parallel algorithms for the memory flux and for the explicit scheme.
MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)
1994-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonachea, Dan; Hargrove, P.
GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".
Single Sided Messaging v. 0.6.6
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curry, Matthew Leon; Farmer, Matthew Shane; Hassani, Amin
Single-Sided Messaging (SSM) is a portable, multitransport networking library that enables applications to leverage potential one-sided capabilities of underlying network transports. It also provides desirable semantics that services for highperformance, massively parallel computers can leverage, such as an explicit cancel operation for pending transmissions, as well as enhanced matching semantics favoring large numbers of buffers attached to a single match entry. This release supports TCP/IP, shared memory, and Infiniband.
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields
NASA Astrophysics Data System (ADS)
Forsythe, Victoriya V.; Makarevich, Roman A.
2016-11-01
An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers
NASA Technical Reports Server (NTRS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak
1996-01-01
This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank
2017-12-14
Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.
NASA Astrophysics Data System (ADS)
Dunlap, Lucas
2016-11-01
I argue that Deutsch's model for the behavior of systems traveling around closed timelike curves (CTCs) relies implicitly on a substantive metaphysical assumption. Deutsch is employing a version of quantum theory with a significantly supplemented ontology of parallel existent worlds, which differ in kind from the many worlds of the Everett interpretation. Standard Everett does not support the existence of multiple identical copies of the world, which the D-CTC model requires. This has been obscured because he often refers to the branching structure of Everett as a "multiverse", and describes quantum interference by reference to parallel interacting definite worlds. But he admits that this is only an approximation to Everett. The D-CTC model, however, relies crucially on the existence of a multiverse of parallel interacting worlds. Since his model is supplemented by structures that go significantly beyond quantum theory, and play an ineliminable role in its predictions and explanations, it does not represent a quantum solution to the paradoxes of time travel.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination
NASA Astrophysics Data System (ADS)
Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo
2018-01-01
We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1999-01-01
The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Fully-Implicit Navier-Stokes (FIN-S)
NASA Technical Reports Server (NTRS)
Kirk, Benjamin S.
2010-01-01
FIN-S is a SUPG finite element code for flow problems under active development at NASA Lyndon B. Johnson Space Center and within PECOS: a) The code is built on top of the libMesh parallel, adaptive finite element library. b) The initial implementation of the code targeted supersonic/hypersonic laminar calorically perfect gas flows & conjugate heat transfer. c) Initial extension to thermochemical nonequilibrium about 9 months ago. d) The technologies in FIN-S have been enhanced through a strongly collaborative research effort with Sandia National Labs.
2017-08-10
simulation models the conformational plasticity along the helix-forming reaction coordinate was limited by free - energy barriers. By comparison the coarse...revealed. The latter becomes evident in comparing the energy Z-score landscapes , where CHARMM22 simulation shows a manifold of shuttling...solvent simulations of calculating the charging free energy of protein conformations.33 Deviation to the protocol by modification of Born radii
Numerical simulation of h-adaptive immersed boundary method for freely falling disks
NASA Astrophysics Data System (ADS)
Zhang, Pan; Xia, Zhenhua; Cai, Qingdong
2018-05-01
In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.
Comparison of Implicit Collocation Methods for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)
2001-01-01
We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.
Calculation of transonic aileron buzz
NASA Technical Reports Server (NTRS)
Steger, J. L.; Bailey, H. E.
1979-01-01
An implicit finite-difference computer code that uses a two-layer algebraic eddy viscosity model and exact geometric specification of the airfoil has been used to simulate transonic aileron buzz. The calculated results, which were performed on both the Illiac IV parallel computer processor and the Control Data 7600 computer, are in essential agreement with the original expository wind-tunnel data taken in the Ames 16-Foot Wind Tunnel just after World War II. These results and a description of the pertinent numerical techniques are included.
PISCES: An environment for parallel scientific computation
NASA Technical Reports Server (NTRS)
Pratt, T. W.
1985-01-01
The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Open SHMEM Reference Implementation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pritchard, Howard; Curtis, Anthony; Welch, Aaron
2016-05-12
OpenSHMEM is an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space. Along with the specification the project is also creating a reference implementation of the API. This implementation attempts to be portable, to allow it to be deployed in multiple environments, and to be a starting point for implementations targeted to particular hardware platforms. It will also serve as a springboard for future development of the API.
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
A review of digital microfluidics as portable platforms for lab-on a-chip applications.
Samiei, Ehsan; Tabrizian, Maryam; Hoorfar, Mina
2016-07-07
Following the development of microfluidic systems, there has been a high tendency towards developing lab-on-a-chip devices for biochemical applications. A great deal of effort has been devoted to improve and advance these devices with the goal of performing complete sets of biochemical assays on the device and possibly developing portable platforms for point of care applications. Among the different microfluidic systems used for such a purpose, digital microfluidics (DMF) shows high flexibility and capability of performing multiplex and parallel biochemical operations, and hence, has been considered as a suitable candidate for lab-on-a-chip applications. In this review, we discuss the most recent advances in the DMF platforms, and evaluate the feasibility of developing multifunctional packages for performing complete sets of processes of biochemical assays, particularly for point-of-care applications. The progress in the development of DMF systems is reviewed from eight different aspects, including device fabrication, basic fluidic operations, automation, manipulation of biological samples, advanced operations, detection, biological applications, and finally, packaging and portability of the DMF devices. Success in developing the lab-on-a-chip DMF devices will be concluded based on the advances achieved in each of these aspects.
Manycore Performance-Portability: Kokkos Multidimensional Array Library
Edwards, H. Carter; Sunderland, Daniel; Porter, Vicki; ...
2012-01-01
Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel executionmore » performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].« less
Roofline model toolkit: A practical tool for architectural and program analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
Algorithms and programming tools for image processing on the MPP, part 2
NASA Technical Reports Server (NTRS)
Reeves, Anthony P.
1986-01-01
A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Practical aspects of prestack depth migration with finite differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.; Oldfield, R.A.; Womble, D.E.
1997-07-01
Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Portable long trace profiler: Concept and solution
NASA Astrophysics Data System (ADS)
Qian, Shinan; Takacs, Peter; Sostero, Giovanni; Cocco, Daniele
2001-08-01
Since the early development of the penta-prism long trace profiler (LTP) and the in situ LTP, and following the completion of the first in situ distortion profile measurements at Sincrotrone Trieste (ELETTRA) in Italy in 1995, a concept was developed for a compact, portable LTP with the following characteristics: easily installed on synchrotron radiation beam lines, easily carried to different laboratories around the world for measurements and calibration, convenient for use in evaluating the LTP as an in-process tool in the optical workshop, and convenient for use in temporarily installation as required by other special applications. The initial design of a compact LTP optical head was made at ELETTRA in 1995. Since 1997 further efforts to reduce the optical head size and weight, and to improve measurement stability have been made at Brookhaven National Laboratory. This article introduces the following solutions and accomplishments for the portable LTP: (1) a new design for a compact and very stable optical head, (2) the use of a small detector connected to a laptop computer directly via an enhanced parallel port, and there is no extra frame grabber interface and control box, (3) a customized small mechanical slide that uses a compact motor with a connector-sized motor controller, and (4) the use of a laptop computer system. These solutions make the portable LTP able to be packed into two laptop-size cases: one for the computer and one for the rest of the system.
NASA Astrophysics Data System (ADS)
Dey, T.; Rodrigue, P.
2015-07-01
We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.
NASA Astrophysics Data System (ADS)
Torres, Hilario; Iaccarino, Gianluca
2017-11-01
Soleil-X is a multi-physics solver being developed at Stanford University as a part of the Predictive Science Academic Alliance Program II. Our goal is to conduct high fidelity simulations of particle laden turbulent flows in a radiation environment for solar energy receiver applications as well as to demonstrate our readiness to effectively utilize next generation Exascale machines. The novel aspect of Soleil-X is that it is built upon the Legion runtime system to enable easy portability to different parallel distributed heterogeneous architectures while also being written entirely in high-level/high-productivity languages (Ebb and Regent). An overview of the Soleil-X software architecture will be given. Results from coupled fluid flow, Lagrangian point particle tracking, and thermal radiation simulations will be presented. Performance diagnostic tools and metrics corresponding the the same cases will also be discussed. US Department of Energy, National Nuclear Security Administration.
NASA Astrophysics Data System (ADS)
Cacace, Mauro; Jacquey, Antoine B.
2017-09-01
Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture-solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment) which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton-Raphson or by free Jacobian inexact Newton-Krylow schemes) on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres) and temporal scales (from minutes to hundreds of years).
Parallel Computing of Upwelling in a Rotating Stratified Flow
NASA Astrophysics Data System (ADS)
Cui, A.; Street, R. L.
1997-11-01
A code for the three-dimensional, unsteady, incompressible, and turbulent flow has been implemented on the IBM SP2, using message passing. The effects of rotation and variable density are included. A finite volume method is used to discretize the Navier-Stokes equations in general curvilinear coordinates on a non-staggered grid. All the spatial derivatives are approximated using second-order central differences with the exception of the convection terms, which are handled with special upwind-difference schemes. The semi-implicit, second-order accurate, time-advancement scheme employs the Adams-Bashforth method for the explicit terms and Crank-Nicolson for the implicit terms. A multigrid method, with the four-color ZEBRA as smoother, is used to solve the Poisson equation for pressure, while the momentum equations are solved with an approximate factorization technique. The code was successfully validated for a variety test cases. Simulations of a laboratory model of coastal upwelling in a rotating annulus are in progress and will be presented.
NASA Astrophysics Data System (ADS)
Kang, S.; Muralikrishnan, S.; Bui-Thanh, T.
2017-12-01
We propose IMEX HDG-DG schemes for Euler systems on cubed sphere. Of interest is subsonic flow, where the speed of the acoustic wave is faster than that of the nonlinear advection. In order to simulate these flows efficiently, we split the governing system into stiff part describing the fast waves and non-stiff part associated with nonlinear advection. The former is discretized implicitly with HDG method while explicit Runge-Kutta DG discretization is employed for the latter. The proposed IMEX HDG-DG framework: 1) facilitates high-order solution both in time and space; 2) avoids overly small time stepsizes; 3) requires only one linear system solve per time step; and 4) relatively to DG generates smaller and sparser linear system while promoting further parallelism owing to HDG discretization. Numerical results for various test cases demonstrate that our methods are comparable to explicit Runge-Kutta DG schemes in terms of accuracy, while allowing for much larger time stepsizes.
NASA Technical Reports Server (NTRS)
Rudy, D. H.; Morris, D. J.
1976-01-01
An uncoupled time asymptotic alternating direction implicit method for solving the Navier-Stokes equations was tested on two laminar parallel mixing flows. A constant total temperature was assumed in order to eliminate the need to solve the full energy equation; consequently, static temperature was evaluated by using algebraic relationship. For the mixing of two supersonic streams at a Reynolds number of 1,000, convergent solutions were obtained for a time step 5 times the maximum allowable size for an explicit method. The solution diverged for a time step 10 times the explicit limit. Improved convergence was obtained when upwind differencing was used for convective terms. Larger time steps were not possible with either upwind differencing or the diagonally dominant scheme. Artificial viscosity was added to the continuity equation in order to eliminate divergence for the mixing of a subsonic stream with a supersonic stream at a Reynolds number of 1,000.
2013-07-01
drugs at potentially very low concentrations in a variety of complex media such as saliva, blood, urine , and other bodily fluids. These handheld...flow assays, such as pregnancy tests, disease and drug abuse screens, and blood protein markers, exhibit widespread use. Recent parallel advances...gadgets have the potential to lower the cost of diagnosis and save immense amounts of time by removing the need to collect, preserve, and ship samples
Wu, Weiwei; Bai, Suo; Yuan, Miaomiao; Qin, Yong; Wang, Zhong Lin; Jing, Tao
2012-07-24
Wearable nanogenerators are of vital importance to portable energy-harvesting and personal electronics. Here we report a method to synthesize a lead zirconate titanate textile in which nanowires are parallel with each other and a procedure to make it into flexible and wearable nanogenerators. The nanogenerator can generate 6 V output voltage and 45 nA output current, which are large enough to power a liquid crystal display and a UV sensor.
MOOSE: A PARALLEL COMPUTATIONAL FRAMEWORK FOR COUPLED SYSTEMS OF NONLINEAR EQUATIONS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G. Hansen; C. Newman; D. Gaston
Systems of coupled, nonlinear partial di?erential equations often arise in sim- ulation of nuclear processes. MOOSE: Multiphysics Ob ject Oriented Simulation Environment, a parallel computational framework targeted at solving these systems is presented. As opposed to traditional data / ?ow oriented com- putational frameworks, MOOSE is instead founded on mathematics based on Jacobian-free Newton Krylov (JFNK). Utilizing the mathematical structure present in JFNK, physics are modularized into “Kernels” allowing for rapid production of new simulation tools. In addition, systems are solved fully cou- pled and fully implicit employing physics based preconditioning allowing for a large amount of ?exibility even withmore » large variance in time scales. Background on the mathematics, an inspection of the structure of MOOSE and several rep- resentative solutions from applications built on the framework are presented.« less
MOOSE: A parallel computational framework for coupled systems of nonlinear equations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Derek Gaston; Chris Newman; Glen Hansen
Systems of coupled, nonlinear partial differential equations (PDEs) often arise in simulation of nuclear processes. MOOSE: Multiphysics Object Oriented Simulation Environment, a parallel computational framework targeted at the solution of such systems, is presented. As opposed to traditional data-flow oriented computational frameworks, MOOSE is instead founded on the mathematical principle of Jacobian-free Newton-Krylov (JFNK) solution methods. Utilizing the mathematical structure present in JFNK, physics expressions are modularized into `Kernels,'' allowing for rapid production of new simulation tools. In addition, systems are solved implicitly and fully coupled, employing physics based preconditioning, which provides great flexibility even with large variance in timemore » scales. A summary of the mathematics, an overview of the structure of MOOSE, and several representative solutions from applications built on the framework are presented.« less
Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Lewis, Robert R.
2011-11-30
Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement calledmore » Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.« less
Infusing pleasure: Mood effects of the consumption of a single cup of tea.
Einöther, Suzanne J L; Rowson, Matthew; Ramaekers, Johannes G; Giesbrecht, Timo
2016-08-01
Tea has historically been associated with mood benefits. Nevertheless, few studies have empirically investigated mood changes after tea consumption. We explored immediate effects of a single cup of tea up to an hour post-consumption on self-reported valence, arousal, discrete emotions, and implicit measures of mood. In a parallel group design, 153 participants received a cup of tea or placebo tea, or a glass of water. Immediately (i.e. 5 min) after consumption, tea increased valence but reduced arousal, as compared to the placebo. There were no differences at later time points. Discrete emotions did not differ significantly between conditions, immediately or over time. Water consumption increased implicit positivity as compared to placebo. Finally, consumption of tea and water resulted in higher interest in activities overall and in specific activity types compared to placebo. The present study shows that effects of a single cup of tea may be limited to an immediate increase in pleasure and decrease in arousal, which can increase interest in activities. Differences between tea and water were not significant, while differences between water and placebo on implicit measures were unexpected. More servings over a longer time may be required to evoke tea's arousing effects and appropriate tea consumption settings may evoke more enduring valence effects. Copyright © 2016 Elsevier Ltd. All rights reserved.
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Multitasking the INS3D-LU code on the Cray Y-MP
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Yoon, Seokkwan
1991-01-01
This paper presents the results of multitasking the INS3D-LU code on eight processors. The code is a full Navier-Stokes solver for incompressible fluid in three dimensional generalized coordinates using a lower-upper symmetric-Gauss-Seidel implicit scheme. This code has been fully vectorized on oblique planes of sweep and parallelized using autotasking with some directives and minor modifications. The timing results for five grid sizes are presented and analyzed. The code has achieved a processing rate of over one Gflops.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spencer, Benjamin Whiting; Crane, Nathan K.; Heinstein, Martin W.
2011-03-01
Adagio is a Lagrangian, three-dimensional, implicit code for the analysis of solids and structures. It uses a multi-level iterative solver, which enables it to solve problems with large deformations, nonlinear material behavior, and contact. It also has a versatile library of continuum and structural elements, and an extensive library of material models. Adagio is written for parallel computing environments, and its solvers allow for scalable solutions of very large problems. Adagio uses the SIERRA Framework, which allows for coupling with other SIERRA mechanics codes. This document describes the functionality and input structure for Adagio.
Parallelizing a peanut butter sandwich
NASA Astrophysics Data System (ADS)
Quenette, S. M.
2005-12-01
This poster aims to demonstrate, in a novel way, why contemporary computational code development is seemingly hard to a geodynamics modeler (i.e. a non-computer-scientist). For example, to utilise comtemporary computer hardware, parallelisation is required. But why do we chose the explicit approach (MPI) over an implicit (OpenMP) one? How does this relate to the typical geodynamics codes. And do we face this same style of problems in every day life? We aim to demonstrate that the little bit of complexity, fore-thought and effort is worth its while.
Calibrationless parallel magnetic resonance imaging: a joint sparsity model.
Majumdar, Angshul; Chaudhury, Kunal Narayan; Ward, Rabab
2013-12-05
State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation) stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than) state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets-eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used-Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods-CS SENSE and l1SPIRiT and two calibration free techniques-Distributed CS and SAKE. Our method yields better reconstruction results than all of them.
NASA Astrophysics Data System (ADS)
Xing, F.; Masson, R.; Lopez, S.
2017-09-01
This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
A Portable Parallel Implementation of the U.S. Navy Layered Ocean Model
1995-01-01
Wallcraft, PhD (I.C. 1981) Planning Systems Inc. & P. R. Moore, PhD (Camb. 1971) IC Dept. Math. DR Moore 1° Encontro de Metodos Numericos...Kendall Square, Hypercube, D R Moore 1 ° Encontro de Metodos Numericos para Equacöes de Derivadas Parciais A. J. Wallcraft IC Mathematics...chips: Chips Machine DEC Alpha CrayT3D/E SUN Sparc Fujitsu AP1000 Intel 860 Paragon D R Moore 1° Encontro de Metodos Numericos para Equacöes
Social media in dermatology: moving to Web 2.0.
Travers, Robin L
2012-09-01
Patient use of social media platforms for accessing medical information has accelerated in parallel with overall use of the Internet. Dermatologists must keep pace with our patients' use of these media through either passive or active means are outlined in detail for 4 specific social media outlets. A 5-step plan for active engagement in social media applications is presented. Implications for medical professionalism, Health Insurance Portability and Accountability Act compliance, and crisis management are discussed. Copyright © 2012. Published by Elsevier Inc.
Free energy landscape of protein folding in water: explicit vs. implicit solvent.
Zhou, Ruhong
2003-11-01
The Generalized Born (GB) continuum solvent model is arguably the most widely used implicit solvent model in protein folding and protein structure prediction simulations; however, it still remains an open question on how well the model behaves in these large-scale simulations. The current study uses the beta-hairpin from C-terminus of protein G as an example to explore the folding free energy landscape with various GB models, and the results are compared to the explicit solvent simulations and experiments. All free energy landscapes are obtained from extensive conformation space sampling with a highly parallel replica exchange method. Because solvation model parameters are strongly coupled with force fields, five different force field/solvation model combinations are examined and compared in this study, namely the explicit solvent model: OPLSAA/SPC model, and the implicit solvent models: OPLSAA/SGB (Surface GB), AMBER94/GBSA (GB with Solvent Accessible Surface Area), AMBER96/GBSA, and AMBER99/GBSA. Surprisingly, we find that the free energy landscapes from implicit solvent models are quite different from that of the explicit solvent model. Except for AMBER96/GBSA, all other implicit solvent models find the lowest free energy state not the native state. All implicit solvent models show erroneous salt-bridge effects between charged residues, particularly in OPLSAA/SGB model, where the overly strong salt-bridge effect results in an overweighting of a non-native structure with one hydrophobic residue F52 expelled from the hydrophobic core in order to make better salt bridges. On the other hand, both AMBER94/GBSA and AMBER99/GBSA models turn the beta-hairpin in to an alpha-helix, and the alpha-helical content is much higher than the previously reported alpha-helices in an explicit solvent simulation with AMBER94 (AMBER94/TIP3P). Only AMBER96/GBSA shows a reasonable free energy landscape with the lowest free energy structure the native one despite an erroneous salt-bridge between D47 and K50. Detailed results on free energy contour maps, lowest free energy structures, distribution of native contacts, alpha-helical content during the folding process, NOE comparison with NMR, and temperature dependences are reported and discussed for all five models. Copyright 2003 Wiley-Liss, Inc.
Implementing Multidisciplinary and Multi-Zonal Applications Using MPI
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.
1995-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. Unfortunately, simple message passing models, like Intel's NX library, only allow point-to-point and global communication within a single system-defined partition. This makes implementation of these applications quite difficult, if not impossible. In this report it is shown that the new Message Passing Interface (MPI) standard is a viable portable library for implementing the message passing portion of multidisciplinary applications. Further, with the extension of a portable loader, fully portable multidisciplinary application programs can be developed. Finally, the performance of MPI is compared to that of some native message passing libraries. This comparison shows that MPI can be implemented to deliver performance commensurate with native message libraries.
Early Experiences Writing Performance Portable OpenMP 4 Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Hernandez, Oscar R
In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
Stand-Sit Microchip for High-Throughput, Multiplexed Analysis of Single Cancer Cells.
Ramirez, Lisa; Herschkowitz, Jason I; Wang, Jun
2016-09-01
Cellular heterogeneity in function and response to therapeutics has been a major challenge in cancer treatment. The complex nature of tumor systems calls for the development of advanced multiplexed single-cell tools that can address the heterogeneity issue. However, to date such tools are only available in a laboratory setting and don't have the portability to meet the needs in point-of-care cancer diagnostics. Towards that application, we have developed a portable single-cell system that is comprised of a microchip and an adjustable clamp, so on-chip operation only needs pipetting and adjusting of clamping force. Up to 10 proteins can be quantitated from each cell with hundreds of single-cell assays performed in parallel from one chip operation. We validated the technology and analyzed the oncogenic signatures of cancer stem cells by quantitating both aldehyde dehydrogenase (ALDH) activities and 5 signaling proteins in single MDA-MB-231 breast cancer cells. The technology has also been used to investigate the PI3K pathway activities of brain cancer cells expressing mutant epidermal growth factor receptor (EGFR) after drug intervention targeting EGFR signaling. Our portable single-cell system will potentially have broad application in the preclinical and clinical settings for cancer diagnosis in the future.
de Miguel, Dunia; Burgaleta, Carmen; Reyes, Eduardo; Pascual, Teresa
2003-07-01
We evaluated a new portable monitor (AvoSure PT PRO, Menarini Diagnostics, Firenze, Italy) developed to test the prothrombin time in capillary blood and plasma by comparing it with the standard laboratory determination. We studied 62 patients receiving acenocoumarol therapy. The international normalized ratio (INR) in capillary blood was analyzed by 2 methods: AvoSure PT PRO and Thrombotrack Nycomed Analyzer (Axis-Shield, Dundee, Scotland). Parallel studies were performed in plasma samples by a reference method using the Behring Coagulation Timer (Behring Diagnostics, Marburg, Germany). Plasma samples also were tested with the AvoSure PT PRO. Correlation was good for INR values for capillary blood and plasma samples by AvoSure PT PRO and our reference method (R2 = 0.8596) and for capillary blood samples tested by the AvoSure PT PRO and Thrombotrack Nycomed Analyzer (R2 = 0.8875). The correlation for INR in capillary blood and plasma samples by AvoSure PT PRO was 0.6939 (P < .0004). Capillary blood determinations are rapid and effective for monitoring oral anticoagulation therapy and have a high correlation to plasma determinations. AvoSure PT PRO is accurate for controlling INR in plasma and capillary blood samples, may be used in outpatient clinics, and has advantages over previous portable monitors.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
A Comparison of Three Programming Models for Adaptive Applications
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)
2000-01-01
We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue
NASA Astrophysics Data System (ADS)
Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.
2017-11-01
The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Parallel community climate model: Description and user`s guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drake, J.B.; Flanery, R.E.; Semeraro, B.D.
This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
MrGrid: A Portable Grid Based Molecular Replacement Pipeline
Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.
2010-01-01
Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612
NASA Technical Reports Server (NTRS)
Korzennik, Sylvain
1997-01-01
Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan
PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Apparatus to collect, classify, concentrate, and characterize gas-borne particles
Rader, Daniel J.; Torczynski, John R.; Wally, Karl; Brockmann, John E.
2002-01-01
An aerosol lab-on-a-chip (ALOC) integrates one or more of a variety of aerosol collection, classification, concentration (enrichment), and characterization processes onto a single substrate or layered stack of such substrates. By taking advantage of modern micro-machining capabilities, an entire suite of discrete laboratory aerosol handling and characterization techniques can be combined in a single portable device that can provide a wealth of data on the aerosol being sampled. The ALOC offers parallel characterization techniques and close proximity of the various characterization modules helps ensure that the same aerosol is available to all devices (dramatically reducing sampling and transport errors). Micro-machine fabrication of the ALOC significantly reduces unit costs relative to existing technology, and enables the fabrication of small, portable ALOC devices, as well as the potential for rugged design to allow operation in harsh environments. Miniaturization also offers the potential of working with smaller particle sizes and lower pressure drops (leading to reduction of power consumption).
Development of a Portable 3CCD Camera System for Multispectral Imaging of Biological Samples
Lee, Hoyoung; Park, Soo Hyun; Noh, Sang Ha; Lim, Jongguk; Kim, Moon S.
2014-01-01
Recent studies have suggested the need for imaging devices capable of multispectral imaging beyond the visible region, to allow for quality and safety evaluations of agricultural commodities. Conventional multispectral imaging devices lack flexibility in spectral waveband selectivity for such applications. In this paper, a recently developed portable 3CCD camera with significant improvements over existing imaging devices is presented. A beam-splitter prism assembly for 3CCD was designed to accommodate three interference filters that can be easily changed for application-specific multispectral waveband selection in the 400 to 1000 nm region. We also designed and integrated electronic components on printed circuit boards with firmware programming, enabling parallel processing, synchronization, and independent control of the three CCD sensors, to ensure the transfer of data without significant delay or data loss due to buffering. The system can stream 30 frames (3-waveband images in each frame) per second. The potential utility of the 3CCD camera system was demonstrated in the laboratory for detecting defect spots on apples. PMID:25350510
Configurable analog-digital conversion using the neural engineering framework
Mayr, Christian G.; Partzsch, Johannes; Noack, Marko; Schüffny, Rene
2014-01-01
Efficient Analog-Digital Converters (ADC) are one of the mainstays of mixed-signal integrated circuit design. Besides the conventional ADCs used in mainstream ICs, there have been various attempts in the past to utilize neuromorphic networks to accomplish an efficient crossing between analog and digital domains, i.e., to build neurally inspired ADCs. Generally, these have suffered from the same problems as conventional ADCs, that is they require high-precision, handcrafted analog circuits and are thus not technology portable. In this paper, we present an ADC based on the Neural Engineering Framework (NEF). It carries out a large fraction of the overall ADC process in the digital domain, i.e., it is easily portable across technologies. The analog-digital conversion takes full advantage of the high degree of parallelism inherent in neuromorphic networks, making for a very scalable ADC. In addition, it has a number of features not commonly found in conventional ADCs, such as a runtime reconfigurability of the ADC sampling rate, resolution and transfer characteristic. PMID:25100933
Multi-scale simulations of space problems with iPIC3D
NASA Astrophysics Data System (ADS)
Lapenta, Giovanni; Bettarini, Lapo; Markidis, Stefano
The implicit Particle-in-Cell method for the computer simulation of space plasma, and its im-plementation in a three-dimensional parallel code, called iPIC3D, are presented. The implicit integration in time of the Vlasov-Maxwell system removes the numerical stability constraints and enables kinetic plasma simulations at magnetohydrodynamics scales. Simulations of mag-netic reconnection in plasma are presented to show the effectiveness of the algorithm. In particular we will show a number of simulations done for large scale 3D systems using the physical mass ratio for Hydrogen. Most notably one simulation treats kinetically a box of tens of Earth radii in each direction and was conducted using about 16000 processors of the Pleiades NASA computer. The work is conducted in collaboration with the MMS-IDS theory team from University of Colorado (M. Goldman, D. Newman and L. Andersson). Reference: Stefano Markidis, Giovanni Lapenta, Rizwan-uddin Multi-scale simulations of plasma with iPIC3D Mathematics and Computers in Simulation, Available online 17 October 2009, http://dx.doi.org/10.1016/j.matcom.2009.08.038
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; Stanier, Adam John
Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
NASA Astrophysics Data System (ADS)
Yang, Haijian; Sun, Shuyu; Yang, Chao
2017-03-01
Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
On the Helix Propensity in Generalized Born Solvent Descriptions of Modeling the Dark Proteome
Olson, Mark A.
2017-01-01
Intrinsically disordered proteins that populate the so-called “Dark Proteome” offer challenging benchmarks of atomistic simulation methods to accurately model conformational transitions on a multidimensional energy landscape. This work explores the application of parallel tempering with implicit solvent models as a computational framework to capture the conformational ensemble of an intrinsically disordered peptide derived from the Ebola virus protein VP35. A recent X-ray crystallographic study reported a protein-peptide interface where the VP35 peptide underwent a folding transition from a disordered form to a helix-β-turn-helix topological fold upon molecular association with the Ebola protein NP. An assessment is provided of the accuracy of two generalized Born solvent models (GBMV2 and GBSW2) using the CHARMM force field and applied with temperature-based replica exchange dynamics to calculate the disorder propensity of the peptide and its probability density of states in a continuum solvent. A further comparison is presented of applying an explicit/implicit solvent hybrid replica exchange simulation of the peptide to determine the effect of modeling water interactions at the all-atom resolution. PMID:28197405
On the Helix Propensity in Generalized Born Solvent Descriptions of Modeling the Dark Proteome.
Olson, Mark A
2017-01-01
Intrinsically disordered proteins that populate the so-called "Dark Proteome" offer challenging benchmarks of atomistic simulation methods to accurately model conformational transitions on a multidimensional energy landscape. This work explores the application of parallel tempering with implicit solvent models as a computational framework to capture the conformational ensemble of an intrinsically disordered peptide derived from the Ebola virus protein VP35. A recent X-ray crystallographic study reported a protein-peptide interface where the VP35 peptide underwent a folding transition from a disordered form to a helix-β-turn-helix topological fold upon molecular association with the Ebola protein NP. An assessment is provided of the accuracy of two generalized Born solvent models (GBMV2 and GBSW2) using the CHARMM force field and applied with temperature-based replica exchange dynamics to calculate the disorder propensity of the peptide and its probability density of states in a continuum solvent. A further comparison is presented of applying an explicit/implicit solvent hybrid replica exchange simulation of the peptide to determine the effect of modeling water interactions at the all-atom resolution.
The force on the flex: Global parallelism and portability
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1986-01-01
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
NASA Astrophysics Data System (ADS)
Kifonidis, K.; Müller, E.
2012-08-01
Aims: We describe and study a family of new multigrid iterative solvers for the multidimensional, implicitly discretized equations of hydrodynamics. Schemes of this class are free of the Courant-Friedrichs-Lewy condition. They are intended for simulations in which widely differing wave propagation timescales are present. A preferred solver in this class is identified. Applications to some simple stiff test problems that are governed by the compressible Euler equations, are presented to evaluate the convergence behavior, and the stability properties of this solver. Algorithmic areas are determined where further work is required to make the method sufficiently efficient and robust for future application to difficult astrophysical flow problems. Methods: The basic equations are formulated and discretized on non-orthogonal, structured curvilinear meshes. Roe's approximate Riemann solver and a second-order accurate reconstruction scheme are used for spatial discretization. Implicit Runge-Kutta (ESDIRK) schemes are employed for temporal discretization. The resulting discrete equations are solved with a full-coarsening, non-linear multigrid method. Smoothing is performed with multistage-implicit smoothers. These are applied here to the time-dependent equations by means of dual time stepping. Results: For steady-state problems, our results show that the efficiency of the present approach is comparable to the best implicit solvers for conservative discretizations of the compressible Euler equations that can be found in the literature. The use of red-black as opposed to symmetric Gauss-Seidel iteration in the multistage-smoother is found to have only a minor impact on multigrid convergence. This should enable scalable parallelization without having to seriously compromise the method's algorithmic efficiency. For time-dependent test problems, our results reveal that the multigrid convergence rate degrades with increasing Courant numbers (i.e. time step sizes). Beyond a Courant number of nine thousand, even complete multigrid breakdown is observed. Local Fourier analysis indicates that the degradation of the convergence rate is associated with the coarse-grid correction algorithm. An implicit scheme for the Euler equations that makes use of the present method was, nevertheless, able to outperform a standard explicit scheme on a time-dependent problem with a Courant number of order 1000. Conclusions: For steady-state problems, the described approach enables the construction of parallelizable, efficient, and robust implicit hydrodynamics solvers. The applicability of the method to time-dependent problems is presently restricted to cases with moderately high Courant numbers. This is due to an insufficient coarse-grid correction of the employed multigrid algorithm for large time steps. Further research will be required to help us to understand and overcome the observed multigrid convergence difficulties for time-dependent problems.
A Fourier collocation time domain method for numerically solving Maxwell's equations
NASA Technical Reports Server (NTRS)
Shebalin, John V.
1991-01-01
A new method for solving Maxwell's equations in the time domain for arbitrary values of permittivity, conductivity, and permeability is presented. Spatial derivatives are found by a Fourier transform method and time integration is performed using a second order, semi-implicit procedure. Electric and magnetic fields are collocated on the same grid points, rather than on interleaved points, as in the Finite Difference Time Domain (FDTD) method. Numerical results are presented for the propagation of a 2-D Transverse Electromagnetic (TEM) mode out of a parallel plate waveguide and into a dielectric and conducting medium.
On scheduling task systems with variable service times
NASA Astrophysics Data System (ADS)
Maset, Richard G.; Banawan, Sayed A.
1993-08-01
Several strategies have been proposed for developing optimal and near-optimal schedules for task systems (jobs consisting of multiple tasks that can be executed in parallel). Most such strategies, however, implicitly assume deterministic task service times. We show that these strategies are much less effective when service times are highly variable. We then evaluate two strategies—one adaptive, one static—that have been proposed for retaining high performance despite such variability. Both strategies are extensions of critical path scheduling, which has been found to be efficient at producing near-optimal schedules. We found the adaptive approach to be quite effective.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Buluc, Aydn; Pothen, Alex
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting
Azad, Ariful; Buluc, Aydn; Pothen, Alex
2016-03-24
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Simple and powerful visual stimulus generator.
Kremlácek, J; Kuba, M; Kubová, Z; Vít, F
1999-02-01
We describe a cheap, simple, portable and efficient approach to visual stimulation for neurophysiology which does not need any special hardware equipment. The method based on an animation technique uses the FLI autodesk animator format. This form of the animation is replayed by a special program ('player') providing synchronisation pulses toward recording system via parallel port. The 'player is running on an IBM compatible personal computer under MS-DOS operation system and stimulus is displayed on a VGA computer monitor. Various stimuli created with this technique for visual evoked potentials (VEPs) are presented.
The WorkPlace distributed processing environment
NASA Technical Reports Server (NTRS)
Ames, Troy; Henderson, Scott
1993-01-01
Real time control problems require robust, high performance solutions. Distributed computing can offer high performance through parallelism and robustness through redundancy. Unfortunately, implementing distributed systems with these characteristics places a significant burden on the applications programmers. Goddard Code 522 has developed WorkPlace to alleviate this burden. WorkPlace is a small, portable, embeddable network interface which automates message routing, failure detection, and re-configuration in response to failures in distributed systems. This paper describes the design and use of WorkPlace, and its application in the construction of a distributed blackboard system.
GPU-based acceleration of computations in nonlinear finite element deformation analysis.
Mafi, Ramin; Sirouspour, Shahin
2014-03-01
The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Kwon, Deuk-Chul; Shin, Sung-Sik; Yu, Dong-Hun
2017-10-01
In order to reduce the computing time in simulation of radio frequency (rf) plasma sources, various numerical schemes were developed. It is well known that the upwind, exponential, and power-law schemes can efficiently overcome the limitation on the grid size for fluid transport simulations of high density plasma discharges. Also, the semi-implicit method is a well-known numerical scheme to overcome on the simulation time step. However, despite remarkable advances in numerical techniques and computing power over the last few decades, efficient multi-dimensional modeling of low temperature plasma discharges has remained a considerable challenge. In particular, there was a difficulty on parallelization in time for the time periodic steady state problems such as capacitively coupled plasma discharges and rf sheath dynamics because values of plasma parameters in previous time step are used to calculate new values each time step. Therefore, we present a parallelization method for the time periodic steady state problems by using period-slices. In order to evaluate the efficiency of the developed method, one-dimensional fluid simulations are conducted for describing rf sheath dynamics. The result shows that speedup can be achieved by using a multithreading method.
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation
NASA Astrophysics Data System (ADS)
Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru
1999-09-01
We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
Hybrid Parallel Contour Trees, Version 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher; Fasel, Patricia; Carr, Hamish
A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less
Parallel 3D Multi-Stage Simulation of a Turbofan Engine
NASA Technical Reports Server (NTRS)
Turner, Mark G.; Topp, David A.
1998-01-01
A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Real time software tools and methodologies
NASA Technical Reports Server (NTRS)
Christofferson, M. J.
1981-01-01
Real time systems are characterized by high speed processing and throughput as well as asynchronous event processing requirements. These requirements give rise to particular implementations of parallel or pipeline multitasking structures, of intertask or interprocess communications mechanisms, and finally of message (buffer) routing or switching mechanisms. These mechanisms or structures, along with the data structue, describe the essential character of the system. These common structural elements and mechanisms are identified, their implementation in the form of routines, tasks or macros - in other words, tools are formalized. The tools developed support or make available the following: reentrant task creation, generalized message routing techniques, generalized task structures/task families, standardized intertask communications mechanisms, and pipeline and parallel processing architectures in a multitasking environment. Tools development raise some interesting prospects in the areas of software instrumentation and software portability. These issues are discussed following the description of the tools themselves.
The Numerical Technique for the Landslide Tsunami Simulations Based on Navier-Stokes Equations
NASA Astrophysics Data System (ADS)
Kozelkov, A. S.
2017-12-01
The paper presents an integral technique simulating all phases of a landslide-driven tsunami. The technique is based on the numerical solution of the system of Navier-Stokes equations for multiphase flows. The numerical algorithm uses a fully implicit approximation method, in which the equations of continuity and momentum conservation are coupled through implicit summands of pressure gradient and mass flow. The method we propose removes severe restrictions on the time step and allows simulation of tsunami propagation to arbitrarily large distances. The landslide origin is simulated as an individual phase being a Newtonian fluid with its own density and viscosity and separated from the water and air phases by an interface. The basic formulas of equation discretization and expressions for coefficients are presented, and the main steps of the computation procedure are described in the paper. To enable simulations of tsunami propagation across wide water areas, we propose a parallel algorithm of the technique implementation, which employs an algebraic multigrid method. The implementation of the multigrid method is based on the global level and cascade collection algorithms that impose no limitations on the paralleling scale and make this technique applicable to petascale systems. We demonstrate the possibility of simulating all phases of a landslide-driven tsunami, including its generation, propagation and uprush. The technique has been verified against the problems supported by experimental data. The paper describes the mechanism of incorporating bathymetric data to simulate tsunamis in real water areas of the world ocean. Results of comparison with the nonlinear dispersion theory, which has demonstrated good agreement, are presented for the case of a historical tsunami of volcanic origin on the Montserrat Island in the Caribbean Sea.
Parallel Semi-Implicit Spectral Element Atmospheric Model
NASA Astrophysics Data System (ADS)
Fournier, A.; Thomas, S.; Loft, R.
2001-05-01
The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
NASA Astrophysics Data System (ADS)
Lee, Eun Seok
2000-10-01
An improved aerodynamics performance of a turbine cascade shape can be achieved by an understanding of the flow-field associated with the stator-rotor interaction. In this research, an axial gas turbine airfoil cascade shape is optimized for improved aerodynamic performance by using an unsteady Navier-Stokes solver and a parallel genetic algorithm. The objective of the research is twofold: (1) to develop a computational fluid dynamics code having faster convergence rate and unsteady flow simulation capabilities, and (2) to optimize a turbine airfoil cascade shape with unsteady passing wakes for improved aerodynamic performance. The computer code solves the Reynolds averaged Navier-Stokes equations. It is based on the explicit, finite difference, Runge-Kutta time marching scheme and the Diagonalized Alternating Direction Implicit (DADI) scheme, with the Baldwin-Lomax algebraic and k-epsilon turbulence modeling. Improvements in the code focused on the cascade shape design capability, convergence acceleration and unsteady formulation. First, the inverse shape design method was implemented in the code to provide the design capability, where a surface transpiration concept was employed as an inverse technique to modify the geometry satisfying the user specified pressure distribution on the airfoil surface. Second, an approximation storage multigrid method was implemented as an acceleration technique. Third, the preconditioning method was adopted to speed up the convergence rate in solving the low Mach number flows. Finally, the implicit dual time stepping method was incorporated in order to simulate the unsteady flow-fields. For the unsteady code validation, the Stokes's 2nd problem and the Poiseuille flow were chosen and compared with the computed results and analytic solutions. To test the code's ability to capture the natural unsteady flow phenomena, vortex shedding past a cylinder and the shock oscillation over a bicircular airfoil were simulated and compared with experiments and other research results. The rotor cascade shape optimization with unsteady passing wakes was performed to obtain an improved aerodynamic performance using the unsteady Navier-Stokes solver. Two objective functions were defined as minimization of total pressure loss and maximization of lift, while the mass flow rate was fixed. A parallel genetic algorithm was used as an optimizer and the penalty method was introduced. Each individual's objective function was computed simultaneously by using a 32 processor distributed memory computer. One optimization took about four days.
Empathy: Gender effects in brain and behavior
Christov-Moore, Leonardo; Simpson, Elizabeth A.; Coudé, Gino; Grigaityte, Kristina; Iacoboni, Marco; Ferrari, Pier Francesco
2016-01-01
Evidence suggests that there are differences in the capacity for empathy between males and females. However, how deep do these differences go? Stereotypically, females are portrayed as more nurturing and empathetic, while males are portrayed as less emotional and more cognitive. Some authors suggest that observed gender differences might be largely due to cultural expectations about gender roles. However, empathy has both evolutionary and developmental precursors, and can be studied using implicit measures, aspects that can help elucidate the respective roles of culture and biology. This article reviews evidence from ethology, social psychology, economics, and neuroscience to show that there are fundamental differences in implicit measures of empathy, with parallels in development and evolution. Studies in nonhuman animals and younger human populations (infants/children) offer converging evidence that sex differences in empathy have phylogenetic and ontogenetic roots in biology and are not merely cultural byproducts driven by socialization. We review how these differences may have arisen in response to males’ and females’ different roles throughout evolution. Examinations of the neurobiological underpinnings of empathy reveal important quantitative gender differences in the basic networks involved in affective and cognitive forms of empathy, as well as a qualitative divergence between the sexes in how emotional information is integrated to support decision making processes. Finally, the study of gender differences in empathy can be improved by designing studies with greater statistical power and considering variables implicit in gender (e.g., sexual preference, prenatal hormone exposure). These improvements may also help uncover the nature of neurodevelopmental and psychiatric disorders in which one sex is more vulnerable to compromised social competence associated with impaired empathy. PMID:25236781
A Parallel Processing Algorithm for Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony
2005-01-01
A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
NASA Astrophysics Data System (ADS)
Yu, H.; Wang, Z.; Zhang, C.; Chen, N.; Zhao, Y.; Sawchuk, A. P.; Dalsing, M. C.; Teague, S. D.; Cheng, Y.
2014-11-01
Existing research of patient-specific computational hemodynamics (PSCH) heavily relies on software for anatomical extraction of blood arteries. Data reconstruction and mesh generation have to be done using existing commercial software due to the gap between medical image processing and CFD, which increases computation burden and introduces inaccuracy during data transformation thus limits the medical applications of PSCH. We use lattice Boltzmann method (LBM) to solve the level-set equation over an Eulerian distance field and implicitly and dynamically segment the artery surfaces from radiological CT/MRI imaging data. The segments seamlessly feed to the LBM based CFD computation of PSCH thus explicit mesh construction and extra data management are avoided. The LBM is ideally suited for GPU (graphic processing unit)-based parallel computing. The parallel acceleration over GPU achieves excellent performance in PSCH computation. An application study will be presented which segments an aortic artery from a chest CT dataset and models PSCH of the segmented artery.
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2014-01-01
This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.
Efficient solution of parabolic equations by Krylov approximation methods
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1990-01-01
Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
Flexible language constructs for large parallel programs
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
NASA Technical Reports Server (NTRS)
Eberhardt, D. S.; Baganoff, D.; Stevens, K.
1984-01-01
Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.
2018-05-01
Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Concurrence of rule- and similarity-based mechanisms in artificial grammar learning.
Opitz, Bertram; Hofmann, Juliane
2015-03-01
A current theoretical debate regards whether rule-based or similarity-based learning prevails during artificial grammar learning (AGL). Although the majority of findings are consistent with a similarity-based account of AGL it has been argued that these results were obtained only after limited exposure to study exemplars, and performance on subsequent grammaticality judgment tests has often been barely above chance level. In three experiments the conditions were investigated under which rule- and similarity-based learning could be applied. Participants were exposed to exemplars of an artificial grammar under different (implicit and explicit) learning instructions. The analysis of receiver operating characteristics (ROC) during a final grammaticality judgment test revealed that explicit but not implicit learning led to rule knowledge. It also demonstrated that this knowledge base is built up gradually while similarity knowledge governed the initial state of learning. Together these results indicate that rule- and similarity-based mechanisms concur during AGL. Moreover, it could be speculated that two different rule processes might operate in parallel; bottom-up learning via gradual rule extraction and top-down learning via rule testing. Crucially, the latter is facilitated by performance feedback that encourages explicit hypothesis testing. Copyright © 2015 Elsevier Inc. All rights reserved.
Memory formation during anaesthesia: plausibility of a neurophysiological basis
Veselis, R. A.
2015-01-01
As opposed to conscious, personally relevant (explicit) memories that we can recall at will, implicit (unconscious) memories are prototypical of ‘hidden’ memory; memories that exist, but that we do not know we possess. Nevertheless, our behaviour can be affected by these memories; in fact, these memories allow us to function in an ever-changing world. It is still unclear from behavioural studies whether similar memories can be formed during anaesthesia. Thus, a relevant question is whether implicit memory formation is a realistic possibility during anaesthesia, considering the underlying neurophysiology. A different conceptualization of memory taxonomy is presented, the serial parallel independent model of Tulving, which focuses on dynamic information processing with interactions among different memory systems rather than static classification of different types of memories. The neurophysiological basis for subliminal information processing is considered in the context of brain function as embodied in network interactions. Function of sensory cortices and thalamic activity during anaesthesia are reviewed. The role of sensory and perisensory cortices, in particular the auditory cortex, in support of memory function is discussed. Although improbable, with the current knowledge of neurophysiology one cannot rule out the possibility of memory formation during anaesthesia. PMID:25735711
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.
Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Aging and IQ effects on associative recognition and priming in item recognition
McKoon, Gail; Ratcliff, Roger
2012-01-01
Two ways to examine memory for associative relationships between pairs of words were tested: an explicit method, associative recognition, and an implicit method, priming in item recognition. In an experiment with both kinds of tests, participants were asked to learn pairs of words. For the explicit test, participants were asked to decide whether two words of a test pair had been studied in the same or different pairs. For the implicit test, participants were asked to decide whether single words had or had not been among the studied pairs. Some test words were immediately preceded in the test list by the other word of the same pair and some by a word from a different pair. Diffusion model (Ratcliff, 1978; Ratcliff & McKoon, 2008) analyses were carried out for both tasks for college-age participants, 60–74 year olds, and 75–90 year olds, and for higher- and lower-IQ participants, in order to compare the two measures of associative strength. Results showed parallel behavior of drift rates for associative recognition and priming across ages and across IQ, indicating that they are based, at least to some degree, on the same information in memory. PMID:24976676
Nonlinear 3D visco-resistive MHD modeling of fusion plasmas: a comparison between numerical codes
NASA Astrophysics Data System (ADS)
Bonfiglio, D.; Chacon, L.; Cappello, S.
2008-11-01
Fluid plasma models (and, in particular, the MHD model) are extensively used in the theoretical description of laboratory and astrophysical plasmas. We present here a successful benchmark between two nonlinear, three-dimensional, compressible visco-resistive MHD codes. One is the fully implicit, finite volume code PIXIE3D [1,2], which is characterized by many attractive features, notably the generalized curvilinear formulation (which makes the code applicable to different geometries) and the possibility to include in the computation the energy transport equation and the extended MHD version of Ohm's law. In addition, the parallel version of the code features excellent scalability properties. Results from this code, obtained in cylindrical geometry, are compared with those produced by the semi-implicit cylindrical code SpeCyl, which uses finite differences radially, and spectral formulation in the other coordinates [3]. Both single and multi-mode simulations are benchmarked, regarding both reversed field pinch (RFP) and ohmic tokamak magnetic configurations. [1] L. Chacon, Computer Physics Communications 163, 143 (2004). [2] L. Chacon, Phys. Plasmas 15, 056103 (2008). [3] S. Cappello, Plasma Phys. Control. Fusion 46, B313 (2004) & references therein.
Gong, Liang; Wang, JiHua; Yang, XuDong; Feng, Lei; Li, Xiu; Gu, Cui; Wang, MeiHong; Hu, JiaYun; Cheng, Huaidong
2016-01-01
The latest neuroimaging studies about implicit memory (IM) have revealed that different IM types may be processed by different parts of the brain. However, studies have rarely examined what subtypes of IM processes are affected in patients with various brain injuries. Twenty patients with frontal lobe injury, 25 patients with occipital lobe injury, and 29 healthy controls (HC) were recruited for the study. Two subtypes of IM were investigated by using structurally parallel perceptual (picture identification task) and conceptual (category exemplar generation task) IM tests in the three groups, as well as explicit memory (EM) tests. The results indicated that the priming of conceptual IM and EM tasks in patients with frontal lobe injury was poorer than that observed in HC, while perceptual IM was identical between the two groups. By contrast, the priming of perceptual IM in patients with occipital lobe injury was poorer than that in HC, whereas the priming of conceptual IM and EM was similar to that in HC. This double dissociation between perceptual and conceptual IM across the brain areas implies that occipital lobes may participate in perceptual IM, while frontal lobes may be involved in processing conceptual memory. PMID:26793093
A finite element solver for 3-D compressible viscous flows
NASA Technical Reports Server (NTRS)
Reddy, K. C.; Reddy, J. N.; Nayani, S.
1990-01-01
Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.
Spacecraft charging analysis with the implicit particle-in-cell code iPic3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deca, J.; Lapenta, G.; Marchand, R.
2013-10-15
We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation resultsmore » with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.« less
NASA Astrophysics Data System (ADS)
Kunz, A.; Pihet, P.; Arend, E.; Menzel, H. G.
1990-12-01
A portable area monitor for the measurement of dose-equivalent quantities in practical radiation-protection work has been developed. The detector applied is a low-pressure proportional counter (TEPC) used in microdosimetry. The complex analysis system required has been optimized with regard to low power consumption and small size to achieve a real operational survey meter. The newly designed electronic includes complete analog, digital and microprocessor boards. It presents the characteristic of fast pulse-height processing over a large (5 decades) dynamic range. Three original circuits have been specifically developed, consisting of: (1) a miniaturized adjustable high-voltage power supply with low ripple and high stability; (2) a double spectroscopy amplifier with constant gain ratio and common pole-zero stage; and (3) an analog-to-digital converter with quasi-logarithmic characteristics based on a flash converter using fast comparators associated in parallel. With the incorporated single-board computer, the maximal total power consumption is 5 W, enabling 40 hours operation time with batteries. With minor adaptations the equipment is proposed as a low-cost solution for various measuring problems in environmental studies.
NASA Astrophysics Data System (ADS)
Zao, Yongming; Ouyang, Qi; Chen, Jiawei; Zhang, Xinglan; Hou, Shuaicheng
2017-08-01
This paper investigates the design and implementation of an improved series-parallel inductor-capacitor-inductor (LsCpLp) resonant circuit power supply for excitation of electromagnetic acoustic transducers (EMATs). The main advantage of the proposed resonant circuit is the absence of a high-permeability dynamic transformer. A high-frequency pulsating voltage gain can be achieved through a double resonance phenomenon. Both resonant tailing behavior and higher harmonics are suppressed by the improved resonant circuit, which also contributes to the generation of ultrasonic waves. Additionally, the proposed circuit can realize impedance matching and can also optimize the transduction efficiency. The complete design and implementation procedure for the power supply is described and has been validated by implementation of the proposed power supply to drive a portable EMAT. The circuit simulation results show close agreement with the experimental results and thus confirm the validity of the proposed topology. The proposed circuit is suitable for use as a portable EMAT excitation power supply that is fed by a low-voltage source.
Freestanding mesoporous VN/CNT hybrid electrodes for flexible all-solid-state supercapacitors.
Xiao, Xu; Peng, Xiang; Jin, Huanyu; Li, Tianqi; Zhang, Chengcheng; Gao, Biao; Hu, Bin; Huo, Kaifu; Zhou, Jun
2013-09-25
High-performance all-solid-state supercapacitors (SCs) are fabricated based on thin, lightweight, and flexible freestanding MVNN/CNT hybrid electrodes. The device shows a high volume capacitance of 7.9 F/cm(3) , volume energy and power density of 0.54 mWh/cm(3) and 0.4 W/cm(3) at a current density of 0.025 A/cm(3) . By being highly flexible, environmentally friendly, and easily connectable in series and parallel, the all-solid-state SCs promise potential applications in portable/wearable electronics. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
Using collective variables to drive molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Fiorin, Giacomo; Klein, Michael L.; Hénin, Jérôme
2013-12-01
A software framework is introduced that facilitates the application of biasing algorithms to collective variables of the type commonly employed to drive massively parallel molecular dynamics (MD) simulations. The modular framework that is presented enables one to combine existing collective variables into new ones, and combine any chosen collective variable with available biasing methods. The latter include the classic time-dependent biases referred to as steered MD and targeted MD, the temperature-accelerated MD algorithm, as well as the adaptive free-energy biases called metadynamics and adaptive biasing force. The present modular software is extensible, and portable between commonly used MD simulation engines.
On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics
Calcagno, Cristina; Coppo, Mario
2014-01-01
The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327
On designing multicore-aware simulators for systems biology endowed with OnLine statistics.
Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo
2014-01-01
The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.
Scalable Performance Environments for Parallel Systems
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Olson, Robert D.; Aydt, Ruth A.; Madhyastha, Tara M.; Birkett, Thomas; Jensen, David W.; Nazief, Bobby A. A.; Totty, Brian K.
1991-01-01
As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software performance tuning. Moreover, given the pace of technological change, we can no longer afford to develop ad hoc, one-of-a-kind performance instrumentation software; we need scalable, portable performance analysis tools. We describe an environment prototype based on the lessons learned from two previous generations of performance data analysis software. Our environment prototype contains a set of performance data transformation modules that can be interconnected in user-specified ways. It is the responsibility of the environment infrastructure to hide details of module interconnection and data sharing. The environment is written in C++ with the graphical displays based on X windows and the Motif toolkit. It allows users to interconnect and configure modules graphically to form an acyclic, directed data analysis graph. Performance trace data are represented in a self-documenting stream format that includes internal definitions of data types, sizes, and names. The environment prototype supports the use of head-mounted displays and sonic data presentation in addition to the traditional use of visual techniques.
Methodology for cost analysis of film-based and filmless portable chest systems
NASA Astrophysics Data System (ADS)
Melson, David L.; Gauvain, Karen M.; Beardslee, Brian M.; Kraitsik, Michael J.; Burton, Larry; Blaine, G. James; Brink, Gary S.
1996-05-01
Many studies analyzing the costs of film-based and filmless radiology have focused on multi- modality, hospital-wide solutions. Yet due to the enormous cost of converting an entire large radiology department or hospital to a filmless environment all at once, institutions often choose to eliminate film one area at a time. Narrowing the focus of cost-analysis may be useful in making such decisions. This presentation will outline a methodology for analyzing the cost per exam of film-based and filmless solutions for providing portable chest exams to Intensive Care Units (ICUs). The methodology, unlike most in the literature, is based on parallel data collection from existing filmless and film-based ICUs, and is currently being utilized at our institution. Direct costs, taken from the perspective of the hospital, for portable computed radiography chest exams in one filmless and two film-based ICUs are identified. The major cost components are labor, equipment, materials, and storage. Methods for gathering and analyzing each of the cost components are discussed, including FTE-based and time-based labor analysis, incorporation of equipment depreciation, lease, and maintenance costs, and estimation of materials costs. Extrapolation of data from three ICUs to model hypothetical, hospital-wide film-based and filmless ICU imaging systems is described. Performance of sensitivity analysis on the filmless model to assess the impact of anticipated reductions in specific labor, equipment, and archiving costs is detailed. A number of indirect costs, which are not explicitly included in the analysis, are identified and discussed.
Parallel Implementation of a Frozen Flow Based Wavefront Reconstructor
NASA Astrophysics Data System (ADS)
Nagy, J.; Kelly, K.
2013-09-01
Obtaining high resolution images of space objects from ground based telescopes is challenging, often requiring the use of a multi-frame blind deconvolution (MFBD) algorithm to remove blur caused by atmospheric turbulence. In order for an MFBD algorithm to be effective, it is necessary to obtain a good initial estimate of the wavefront phase. Although wavefront sensors work well in low turbulence situations, they are less effective in high turbulence, such as when imaging in daylight, or when imaging objects that are close to the Earth's horizon. One promising approach, which has been shown to work very well in high turbulence settings, uses a frozen flow assumption on the atmosphere to capture the inherent temporal correlations present in consecutive frames of wavefront data. Exploiting these correlations can lead to more accurate estimation of the wavefront phase, and the associated PSF, which leads to more effective MFBD algorithms. However, with the current serial implementation, the approach can be prohibitively expensive in situations when it is necessary to use a large number of frames. In this poster we describe a parallel implementation that overcomes this constraint. The parallel implementation exploits sparse matrix computations, and uses the Trilinos package developed at Sandia National Laboratories. Trilinos provides a variety of core mathematical software for parallel architectures that have been designed using high quality software engineering practices, The package is open source, and portable to a variety of high-performance computing architectures.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Asgharzadeh, Hafez; Borazjani, Iman
2017-02-15
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 - 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
Asgharzadeh, Hafez; Borazjani, Iman
2016-01-01
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 – 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80–90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future. PMID:28042172
NASA Astrophysics Data System (ADS)
Asgharzadeh, Hafez; Borazjani, Iman
2017-02-01
The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for non-linear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form a preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42-74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal and full Jacobian, respectivley, when the stretching factor was increased. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Porting plasma physics simulation codes to modern computing architectures using the
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Abbott, Stephen
2015-11-01
Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
Multiple grid problems on concurrent-processing computers
NASA Technical Reports Server (NTRS)
Eberhardt, D. S.; Baganoff, D.
1986-01-01
Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Sala, Marzio
In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
NASA Astrophysics Data System (ADS)
Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.
2017-12-01
StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.
NASA Astrophysics Data System (ADS)
Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J. Andrew
2015-12-01
Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson-Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum-Chandler-Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods.
Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J Andrew
2015-12-28
Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson-Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum-Chandler-Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods.
Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J. Andrew
2015-01-01
Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson–Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum–Chandler–Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods. PMID:26723595
Status of parallel Python-based implementation of UEDGE
NASA Astrophysics Data System (ADS)
Umansky, M. V.; Pankin, A. Y.; Rognlien, T. D.; Dimits, A. M.; Friedman, A.; Joseph, I.
2017-10-01
The tokamak edge transport code UEDGE has long used the code-development and run-time framework Basis. However, with the support for Basis expected to terminate in the coming years, and with the advent of the modern numerical language Python, it has become desirable to move UEDGE to Python, to ensure its long-term viability. Our new Python-based UEDGE implementation takes advantage of the portable build system developed for FACETS. The new implementation gives access to Python's graphical libraries and numerical packages for pre- and post-processing, and support of HDF5 simplifies exchanging data. The older serial version of UEDGE has used for time-stepping the Newton-Krylov solver NKSOL. The renovated implementation uses backward Euler discretization with nonlinear solvers from PETSc, which has the promise to significantly improve the UEDGE parallel performance. We will report on assessment of some of the extended UEDGE capabilities emerging in the new implementation, and will discuss the future directions. Work performed for U.S. DOE by LLNL under contract DE-AC52-07NA27344.
Adaptive Numerical Algorithms in Space Weather Modeling
NASA Technical Reports Server (NTRS)
Toth, Gabor; vanderHolst, Bart; Sokolov, Igor V.; DeZeeuw, Darren; Gombosi, Tamas I.; Fang, Fang; Manchester, Ward B.; Meng, Xing; Nakib, Dalal; Powell, Kenneth G.;
2010-01-01
Space weather describes the various processes in the Sun-Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different physics in different domains. A multi-physics system can be modeled by a software framework comprising of several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solar wind Roe Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamics (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit numerical schemes. Depending on the application, we find that different time stepping methods are optimal. Several of the time integration schemes exploit the block-based granularity of the grid structure. The framework and the adaptive algorithms enable physics based space weather modeling and even forecasting.
Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.
1990-01-01
The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Aerodynamic optimization studies on advanced architecture computers
NASA Technical Reports Server (NTRS)
Chawla, Kalpana
1995-01-01
The approach to carrying out multi-discipline aerospace design studies in the future, especially in massively parallel computing environments, comprises of choosing (1) suitable solvers to compute solutions to equations characterizing a discipline, and (2) efficient optimization methods. In addition, for aerodynamic optimization problems, (3) smart methodologies must be selected to modify the surface shape. In this research effort, a 'direct' optimization method is implemented on the Cray C-90 to improve aerodynamic design. It is coupled with an existing implicit Navier-Stokes solver, OVERFLOW, to compute flow solutions. The optimization method is chosen such that it can accomodate multi-discipline optimization in future computations. In the work , however, only single discipline aerodynamic optimization will be included.
Accuracy of a class of concurrent algorithms for transient finite element analysis
NASA Technical Reports Server (NTRS)
Ortiz, Michael; Sotelino, Elisa D.; Nour-Omid, Bahram
1988-01-01
The accuracy of a new class of concurrent procedures for transient finite element analysis is examined. A phase error analysis is carried out which shows that wave retardation leading to unacceptable loss of accuracy may occur if a Courant condition based on the dimensions of the subdomains is violated. Numerical tests suggest that this Courant condition is conservative for typical structural applications and may lead to a marked increase in accuracy as the number of subdomains is increased. Theoretical speed-up ratios are derived which suggest that the algorithms under consideration can be expected to exhibit a performance superior to that of globally implicit methods when implemented on parallel machines.
Error analysis of multipoint flux domain decomposition methods for evolutionary diffusion problems
NASA Astrophysics Data System (ADS)
Arrarás, A.; Portero, L.; Yotov, I.
2014-01-01
We study space and time discretizations for mixed formulations of parabolic problems. The spatial approximation is based on the multipoint flux mixed finite element method, which reduces to an efficient cell-centered pressure system on general grids, including triangles, quadrilaterals, tetrahedra, and hexahedra. The time integration is performed by using a domain decomposition time-splitting technique combined with multiterm fractional step diagonally implicit Runge-Kutta methods. The resulting scheme is unconditionally stable and computationally efficient, as it reduces the global system to a collection of uncoupled subdomain problems that can be solved in parallel without the need for Schwarz-type iteration. Convergence analysis for both the semidiscrete and fully discrete schemes is presented.
Thermal Ablation Modeling for Silicate Materials
NASA Technical Reports Server (NTRS)
Chen, Yih-Kanq
2016-01-01
A general thermal ablation model for silicates is proposed. The model includes the mass losses through the balance between evaporation and condensation, and through the moving molten layer driven by surface shear force and pressure gradient. This model can be applied in the ablation simulation of the meteoroid and the glassy ablator for spacecraft Thermal Protection Systems. Time-dependent axisymmetric computations are performed by coupling the fluid dynamics code, Data-Parallel Line Relaxation program, with the material response code, Two-dimensional Implicit Thermal Ablation simulation program, to predict the mass lost rates and shape change. The predicted mass loss rates will be compared with available data for model validation, and parametric studies will also be performed for meteoroid earth entry conditions.
MODEST: A Tool for Geodesy and Astronomy
NASA Technical Reports Server (NTRS)
Sovers, Ojars J.; Jacobs, Christopher S.; Lanyi, Gabor E.
2004-01-01
Features of the JPL VLBI modeling and estimation software "MODEST" are reviewed. Its main advantages include thoroughly documented model physics, portability, and detailed error modeling. Two unique models are included: modeling of source structure and modeling of both spatial and temporal correlations in tropospheric delay noise. History of the code parallels the development of the astrometric and geodetic VLBI technique and the software retains many of the models implemented during its advancement. The code has been traceably maintained since the early 1980s, and will continue to be updated with recent IERS standards. Scripts are being developed to facilitate user-friendly data processing in the era of e-VLBI.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allan, Benjamin A.
We report on the use and design of a portable, extensible performance data collection tool motivated by modeling needs of the high performance computing systems co-design com- munity. The lightweight performance data collectors with Eiger support is intended to be a tailorable tool, not a shrink-wrapped library product, as pro ling needs vary widely. A single code markup scheme is reported which, based on compilation ags, can send perfor- mance data from parallel applications to CSV les, to an Eiger mysql database, or (in a non-database environment) to at les for later merging and loading on a host with mysqlmore » available. The tool supports C, C++, and Fortran applications.« less
Dynamic multistation photometer
Bauer, Martin L.; Johnson, Wayne F.; Lakomy, Dale G.
1977-01-01
A portable fast analyzer is provided that uses a magnetic clutch/brake to rapidly accelerate the analyzer rotor, and employs a microprocessor for automatic analyzer operation. The rotor is held stationary while the drive motor is run up to speed. When it is desired to mix the sample(s) and reagent(s), the brake is deenergized and the clutch is energized wherein the rotor is very rapidly accelerated to the running speed. The parallel path rotor that is used allows the samples and reagents to be mixed the moment they are spun out into the rotor cuvetes and data acquisition begins immediately. The analyzer will thus have special utility for fast reactions.
Continuous-time ΣΔ ADC with implicit variable gain amplifier for CMOS image sensor.
Tang, Fang; Bermak, Amine; Abbes, Amira; Benammar, Mohieddine Amor
2014-01-01
This paper presents a column-parallel continuous-time sigma delta (CTSD) ADC for mega-pixel resolution CMOS image sensor (CIS). The sigma delta modulator is implemented with a 2nd order resistor/capacitor-based loop filter. The first integrator uses a conventional operational transconductance amplifier (OTA), for the concern of a high power noise rejection. The second integrator is realized with a single-ended inverter-based amplifier, instead of a standard OTA. As a result, the power consumption is reduced, without sacrificing the noise performance. Moreover, the variable gain amplifier in the traditional column-parallel read-out circuit is merged into the front-end of the CTSD modulator. By programming the input resistance, the amplitude range of the input current can be tuned with 8 scales, which is equivalent to a traditional 2-bit preamplification function without consuming extra power and chip area. The test chip prototype is fabricated using 0.18 μm CMOS process and the measurement result shows an ADC power consumption lower than 63.5 μW under 1.4 V power supply and 50 MHz clock frequency.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cai, Xiao-Chuan; Keyes, David; Yang, Chao
2014-09-29
The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less
NASA Astrophysics Data System (ADS)
Li, Gaohua; Fu, Xiang; Wang, Fuxin
2017-10-01
The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2015-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Flexible Language Constructs for Large Parallel Programs
Rosing, Matt; Schnabel, Robert
1994-01-01
The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)
NASA Technical Reports Server (NTRS)
Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.
1998-01-01
The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.
Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang
2008-01-01
Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Parallel Cartesian grid refinement for 3D complex flow simulations
NASA Astrophysics Data System (ADS)
Angelidis, Dionysios; Sotiropoulos, Fotis
2013-11-01
A second order accurate method for discretizing the Navier-Stokes equations on 3D unstructured Cartesian grids is presented. Although the grid generator is based on the oct-tree hierarchical method, fully unstructured data-structure is adopted enabling robust calculations for incompressible flows, avoiding both the need of synchronization of the solution between different levels of refinement and usage of prolongation/restriction operators. The current solver implements a hybrid staggered/non-staggered grid layout, employing the implicit fractional step method to satisfy the continuity equation. The pressure-Poisson equation is discretized by using a novel second order fully implicit scheme for unstructured Cartesian grids and solved using an efficient Krylov subspace solver. The momentum equation is also discretized with second order accuracy and the high performance Newton-Krylov method is used for integrating them in time. Neumann and Dirichlet conditions are used to validate the Poisson solver against analytical functions and grid refinement results to a significant reduction of the solution error. The effectiveness of the fractional step method results in the stability of the overall algorithm and enables the performance of accurate multi-resolution real life simulations. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482.
NASA Astrophysics Data System (ADS)
Bonfiglio, D.; Chacón, L.; Cappello, S.
2010-08-01
With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacón, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code in cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonfiglio, Daniele; Chacon, Luis; Cappello, Susanna
2010-01-01
With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacon, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code inmore » cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.« less
Memory formation during anaesthesia: plausibility of a neurophysiological basis.
Veselis, R A
2015-07-01
As opposed to conscious, personally relevant (explicit) memories that we can recall at will, implicit (unconscious) memories are prototypical of 'hidden' memory; memories that exist, but that we do not know we possess. Nevertheless, our behaviour can be affected by these memories; in fact, these memories allow us to function in an ever-changing world. It is still unclear from behavioural studies whether similar memories can be formed during anaesthesia. Thus, a relevant question is whether implicit memory formation is a realistic possibility during anaesthesia, considering the underlying neurophysiology. A different conceptualization of memory taxonomy is presented, the serial parallel independent model of Tulving, which focuses on dynamic information processing with interactions among different memory systems rather than static classification of different types of memories. The neurophysiological basis for subliminal information processing is considered in the context of brain function as embodied in network interactions. Function of sensory cortices and thalamic activity during anaesthesia are reviewed. The role of sensory and perisensory cortices, in particular the auditory cortex, in support of memory function is discussed. Although improbable, with the current knowledge of neurophysiology one cannot rule out the possibility of memory formation during anaesthesia. © The Author 2015. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Development Specification for the Portable Life Support System (PLSS) Thermal Loop Pump
NASA Technical Reports Server (NTRS)
Anchondo, Ian; Campbell, Colin
2017-01-01
The AEMU Thermal Loop Pump Development Specification establishes the requirements for design, performance, and testing of the Water Pump as part of the Thermal System of the Advanced Portable Life Support System (PLSS). It is envisioned that the Thermal Loop Pump is a positive displacement pump that provides a repeatable volume of flow against a given range of back-pressures provided by the various applications. The intention is to operate the pump at a fixed speed for the given application. The primary system is made up of two identical and redundant pumps of which only one is in operation at given time. The Auxiliary Loop Pump is an identical pump design to the primary pumps but is operated at half the flow rate. Inlet positive pressure to the pumps is provided by the upstream Flexible Supply Assembly (FSA-431 and FSA-531) which are physically located inside the suit volume and pressurized by suit pressure. An integrated relief valve, placed in parallel to the pump's inlet and outlet protects the pump and loop from over-pressurization. An integrated course filter is placed upstream of the pump's inlet to provide filtration and prevent potential debris from damaging the pump.
Optical Encoding Technology for Viral Screening Panels Final Report CRADA No TC02132.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lenhoff, R.; Haushalter, R.
This was a collaborative effort between Lawrence Livermore National Security, LLC, Lawrence Livermore National Laboratory (LLNL) and Parallel Synthesis Technologies, Inc. (PSTI), to develop Optical Encoding Technology for Viral Screening Panels. The goal for this effort was to prepare a portable bead reader system that would enable the development of viral and bacterial screening panels which could be used for the detection of any desired set of bacteria or viruses in any location. The main objective was to determine if the combination of a bead-based, PCR suspension array technology, formulated from Parallume encoded beads and PSTI’s multiplex assay reader systemmore » (MARS), could provide advantages in terms of the number of simultaneously measured samples, portability, ruggedness, ease of use, accuracy, precision or cost as compared to the Luminexbased system developed at LLNL. The project underwent several no cost extensions however the overall goal of demonstrating the utility of this new system was achieved. As a result of the project a significant change to the type of bead PSTI used for the suspension system was implemented allowing better performance than the commercial Luminex system.« less
Tian, Ye; Schwieters, Charles D; Opella, Stanley J; Marassi, Francesca M
2017-01-01
Structure determination of proteins by NMR is unique in its ability to measure restraints, very accurately, in environments and under conditions that closely mimic those encountered in vivo. For example, advances in solid-state NMR methods enable structure determination of membrane proteins in detergent-free lipid bilayers, and of large soluble proteins prepared by sedimentation, while parallel advances in solution NMR methods and optimization of detergent-free lipid nanodiscs are rapidly pushing the envelope of the size limit for both soluble and membrane proteins. These experimental advantages, however, are partially squandered during structure calculation, because the commonly used force fields are purely repulsive and neglect solvation, Van der Waals forces and electrostatic energy. Here we describe a new force field, and updated energy functions, for protein structure calculations with EEFx implicit solvation, electrostatics, and Van der Waals Lennard-Jones forces, in the widely used program Xplor-NIH. The new force field is based primarily on CHARMM22, facilitating calculations with a wider range of biomolecules. The new EEFx energy function has been rewritten to enable OpenMP parallelism, and optimized to enhance computation efficiency. It implements solvation, electrostatics, and Van der Waals energy terms together, thus ensuring more consistent and efficient computation of the complete nonbonded energy lists. Updates in the related python module allow detailed analysis of the interaction energies and associated parameters. The new force field and energy function work with both soluble proteins and membrane proteins, including those with cofactors or engineered tags, and are very effective in situations where there are sparse experimental restraints. Results obtained for NMR-restrained calculations with a set of five soluble proteins and five membrane proteins show that structures calculated with EEFx have significant improvements in accuracy, precision, and conformation, and that structure refinement can be obtained by short relaxation with EEFx to obtain improvements in these key metrics. These developments broaden the range of biomolecular structures that can be calculated with high fidelity from NMR restraints.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swesty, F. Douglas; Myra, Eric S.
It is now generally agreed that multidimensional, multigroup, neutrino-radiation hydrodynamics (RHD) is an indispensable element of any realistic model of stellar-core collapse, core-collapse supernovae, and proto-neutron star instabilities. We have developed a new, two-dimensional, multigroup algorithm that can model neutrino-RHD flows in core-collapse supernovae. Our algorithm uses an approach similar to the ZEUS family of algorithms, originally developed by Stone and Norman. However, this completely new implementation extends that previous work in three significant ways: first, we incorporate multispecies, multigroup RHD in a flux-limited-diffusion approximation. Our approach is capable of modeling pair-coupled neutrino-RHD, and includes effects of Pauli blocking inmore » the collision integrals. Blocking gives rise to nonlinearities in the discretized radiation-transport equations, which we evolve implicitly in time. We employ parallelized Newton-Krylov methods to obtain a solution of these nonlinear, implicit equations. Our second major extension to the ZEUS algorithm is the inclusion of an electron conservation equation that describes the evolution of electron-number density in the hydrodynamic flow. This permits calculating deleptonization of a stellar core. Our third extension modifies the hydrodynamics algorithm to accommodate realistic, complex equations of state, including those having nonconvex behavior. In this paper, we present a description of our complete algorithm, giving sufficient details to allow others to implement, reproduce, and extend our work. Finite-differencing details are presented in appendices. We also discuss implementation of this algorithm on state-of-the-art, parallel-computing architectures. Finally, we present results of verification tests that demonstrate the numerical accuracy of this algorithm on diverse hydrodynamic, gravitational, radiation-transport, and RHD sample problems. We believe our methods to be of general use in a variety of model settings where radiation transport or RHD is important. Extension of this work to three spatial dimensions is straightforward.« less
NASA Astrophysics Data System (ADS)
Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.
2014-12-01
The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Parallel Aircraft Trajectory Optimization with Analytic Derivatives
NASA Technical Reports Server (NTRS)
Falck, Robert D.; Gray, Justin S.; Naylor, Bret
2016-01-01
Trajectory optimization is an integral component for the design of aerospace vehicles, but emerging aircraft technologies have introduced new demands on trajectory analysis that current tools are not well suited to address. Designing aircraft with technologies such as hybrid electric propulsion and morphing wings requires consideration of the operational behavior as well as the physical design characteristics of the aircraft. The addition of operational variables can dramatically increase the number of design variables which motivates the use of gradient based optimization with analytic derivatives to solve the larger optimization problems. In this work we develop an aircraft trajectory analysis tool using a Legendre-Gauss-Lobatto based collocation scheme, providing analytic derivatives via the OpenMDAO multidisciplinary optimization framework. This collocation method uses an implicit time integration scheme that provides a high degree of sparsity and thus several potential options for parallelization. The performance of the new implementation was investigated via a series of single and multi-trajectory optimizations using a combination of parallel computing and constraint aggregation. The computational performance results show that in order to take full advantage of the sparsity in the problem it is vital to parallelize both the non-linear analysis evaluations and the derivative computations themselves. The constraint aggregation results showed a significant numerical challenge due to difficulty in achieving tight convergence tolerances. Overall, the results demonstrate the value of applying analytic derivatives to trajectory optimization problems and lay the foundation for future application of this collocation based method to the design of aircraft with where operational scheduling of technologies is key to achieving good performance.
NASA Astrophysics Data System (ADS)
Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya
2017-02-01
GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
Efficient Iterative Methods Applied to the Solution of Transonic Flows
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.
1996-02-01
We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
O'Shea, Brian; Watson, Derrick G; Brown, Gordon D A
2016-02-01
How can implicit attitudes best be measured? The Implicit Relational Assessment Procedure (IRAP), unlike the Implicit Association Test (IAT), claims to measure absolute, not just relative, implicit attitudes. In the IRAP, participants make congruent (Fat Person-Active: false; Fat Person-Unhealthy: true) or incongruent (Fat Person-Active: true; Fat Person-Unhealthy: false) responses in different blocks of trials. IRAP experiments have reported positive or neutral implicit attitudes (e.g., neutral attitudes toward fat people) in cases in which negative attitudes are normally found on explicit or other implicit measures. It was hypothesized that these results might reflect a positive framing bias (PFB) that occurs when participants complete the IRAP. Implicit attitudes toward categories with varying prior associations (nonwords, social systems, flowers and insects, thin and fat people) were measured. Three conditions (standard, positive framing, and negative framing) were used to measure whether framing influenced estimates of implicit attitudes. It was found that IRAP scores were influenced by how the task was framed to the participants, that the framing effect was modulated by the strength of prior stimulus associations, and that a default PFB led to an overestimation of positive implicit attitudes when measured by the IRAP. Overall, the findings question the validity of the IRAP as a tool for the measurement of absolute implicit attitudes. A new tool (Simple Implicit Procedure:SIP) for measuring absolute, not just relative, implicit attitudes is proposed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Gifted Students' Implicit Beliefs about Intelligence and Giftedness
ERIC Educational Resources Information Center
Makel, Matthew C.; Snyder, Kate E.; Thomas, Chandler; Malone, Patrick S.; Putallaz, Martha
2015-01-01
Growing attention is being paid to individuals' implicit beliefs about the nature of intelligence. However, implicit beliefs about giftedness are currently underexamined. In the current study, we examined academically gifted adolescents' implicit beliefs about both intelligence and giftedness. Overall, participants' implicit beliefs about…
Detailed Aerodynamic Analysis of a Shrouded Tail Rotor Using an Unstructured Mesh Flow Solver
NASA Astrophysics Data System (ADS)
Lee, Hee Dong; Kwon, Oh Joon
The detailed aerodynamics of a shrouded tail rotor in hover has been numerically studied using a parallel inviscid flow solver on unstructured meshes. The numerical method is based on a cell-centered finite-volume discretization and an implicit Gauss-Seidel time integration. The calculation was made for a single blade by imposing a periodic boundary condition between adjacent rotor blades. The grid periodicity was also imposed at the periodic boundary planes to avoid numerical inaccuracy resulting from solution interpolation. The results were compared with available experimental data and those from a disk vortex theory for validation. It was found that realistic three-dimensional modeling is important for the prediction of detailed aerodynamics of shrouded rotors including the tip clearance gap flow.
Monte Carlo Transport for Electron Thermal Transport
NASA Astrophysics Data System (ADS)
Chenhall, Jeffrey; Cao, Duc; Moses, Gregory
2015-11-01
The iSNB (implicit Schurtz Nicolai Busquet multigroup electron thermal transport method of Cao et al. is adapted into a Monte Carlo transport method in order to better model the effects of non-local behavior. The end goal is a hybrid transport-diffusion method that combines Monte Carlo Transport with a discrete diffusion Monte Carlo (DDMC). The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the method will be presented. This work was supported by Sandia National Laboratory - Albuquerque and the University of Rochester Laboratory for Laser Energetics.
Rain scavenging of solid rocket exhaust clouds
NASA Technical Reports Server (NTRS)
Dingle, A. N.
1978-01-01
An explicit model for cloud microphysics was developed for application to the problem of co-condensation/vaporization of HCl and H2O in the presence of Al2O3 particulate nuclei. Validity of the explicit model relative to the implicit model, which has been customarily applied to atmospheric cloud studies, was demonstrated by parallel computations of H2O condensation upon (NH4)2 SO4 nuclei. A mesoscale predictive model designed to account for the impact of wet processes on atmospheric dynamics is also under development. Input data specifying the equilibrium state of HC1 and H2O vapors in contact with aqueous HC1 solutions were found to be limited, particularly in respect to temperature range.
Sampling of Protein Folding Transitions: Multicanonical Versus Replica Exchange Molecular Dynamics.
Jiang, Ping; Yaşar, Fatih; Hansmann, Ulrich H E
2013-08-13
We compare the efficiency of multicanonical and replica exchange molecular dynamics for the sampling of folding/unfolding events in simulations of proteins with end-to-end β -sheet. In Go-model simulations of the 75-residue MNK6, we observe improvement factors of 30 in the number of folding/unfolding events of multicanonical molecular dynamics over replica exchange molecular dynamics. As an application, we use this enhanced sampling to study the folding landscape of the 36-residue DS119 with an all-atom physical force field and implicit solvent. Here, we find that the rate-limiting step is the formation of the central helix that then provides a scaffold for the parallel β -sheet formed by the two chain ends.
An analysis of a nonlinear instability in the implementation of a VTOL control system
NASA Technical Reports Server (NTRS)
Weber, J. M.
1982-01-01
The contributions to nonlinear behavior and unstable response of the model following yaw control system of a VTOL aircraft during hover were determined. The system was designed as a state rate feedback implicit model follower that provided yaw rate command/heading hold capability and used combined full authority parallel and limited authority series servo actuators to generate an input to the yaw reaction control system of the aircraft. Both linear and nonlinear system models, as well as describing function linearization techniques were used to determine the influence on the control system instability of input magnitude and bandwidth, series servo authority, and system bandwidth. Results of the analysis describe stability boundaries as a function of these system design characteristics.
Franck, Erik; De Raedt, Rudi; Dereu, Mieke; Van den Abbeele, Dirk
2007-03-01
In the present study, we have further explored implicit self-esteem in currently depressed individuals. Since suicidal ideation is associated with lower self-esteem in depressed individuals, we measured both implicit and explicit self-esteem in a population of currently depressed (CD) individuals, with and without suicidal ideation (SI), and in a group of non-depressed controls (ND). The results indicate that only CD individuals with SI show a discrepancy between their implicit and explicit self-esteem: that is, they exhibit high implicit and low explicit self-esteem. CD individuals without SI exhibit both low implicit and low explicit self-esteem; and ND controls exhibit both normal implicit and normal explicit self-esteem. These results provide new insights in the study of implicit self-esteem and the combination of implicit and explicit self-esteem in depression.
Dijksterhuis, Ap
2004-02-01
On the basis of a conceptualization of implicit self-esteem as the implicit attitude toward the self, it was predicted that implicit self-esteem could be enhanced by subliminal evaluative conditioning. In 5 experiments, participants were repeatedly presented with trials in which the word I was paired with positive trait terms. Relative to control conditions, this procedure enhanced implicit self-esteem. The effects generalized across 3 measures of implicit self-esteem (Experiments 1-3). Furthermore, evaluative conditioning enhanced implicit self-esteem among people with low-temporal implicit self-esteem and among people with high-temporal implicit self-esteem (Experiment 4). In addition, it was shown that conditioning enhanced self-esteem to such an extent that it made participants insensitive to negative intelligence feedback (Experiments 5a and 5b). Various implications are discussed.
Ireland, Jane L; Adams, Christine
2015-01-01
The current study explores associations between implicit and explicit aggression in young adult male prisoners, seeking to apply the Reflection-Impulsive Model and indicate parity with elements of the General Aggression Model and social cognition. Implicit cognitive aggressive processing is not an area that has been examined among prisoners. Two hundred and sixty two prisoners completed an implicit cognitive aggression measure (Puzzle Test) and explicit aggression measures, covering current behaviour (DIPC-R) and aggression disposition (AQ). It was predicted that dispositional aggression would be predicted by implicit cognitive aggression, and that implicit cognitive aggression would predict current engagement in aggressive behaviour. It was also predicted that more impulsive implicit cognitive processing would associate with aggressive behaviour whereas cognitively effortful implicit cognitive processing would not. Implicit aggressive cognitive processing was associated with increased dispositional aggression but not current reports of aggressive behaviour. Impulsive implicit cognitive processing of an aggressive nature predicted increased dispositional aggression whereas more cognitively effortful implicit cognitive aggression did not. The article concludes by outlining the importance of accounting for implicit cognitive processing among prisoners and the need to separate such processing into facets (i.e. impulsive vs. cognitively effortful). Implications for future research and practice in this novel area of study are indicated. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Framework for Integrating Implicit Bias Recognition Into Health Professions Education.
Sukhera, Javeed; Watling, Chris
2018-01-01
Existing literature on implicit bias is fragmented and comes from a variety of fields like cognitive psychology, business ethics, and higher education, but implicit-bias-informed educational approaches have been underexplored in health professions education and are difficult to evaluate using existing tools. Despite increasing attention to implicit bias recognition and management in health professions education, many programs struggle to meaningfully integrate these topics into curricula. The authors propose a six-point actionable framework for integrating implicit bias recognition and management into health professions education that draws on the work of previous researchers and includes practical tools to guide curriculum developers. The six key features of this framework are creating a safe and nonthreatening learning context, increasing knowledge about the science of implicit bias, emphasizing how implicit bias influences behaviors and patient outcomes, increasing self-awareness of existing implicit biases, improving conscious efforts to overcome implicit bias, and enhancing awareness of how implicit bias influences others. Important considerations for designing implicit-bias-informed curricula-such as individual and contextual variables, as well as formal and informal cultural influences-are discussed. The authors also outline assessment and evaluation approaches that consider outcomes at individual, organizational, community, and societal levels. The proposed framework may facilitate future research and exploration regarding the use of implicit bias in health professions education.
Maina, Ivy W; Belton, Tanisha D; Ginzberg, Sara; Singh, Ajit; Johnson, Tiffani J
2018-02-01
Disparities in the care and outcomes of US racial/ethnic minorities are well documented. Research suggests that provider bias plays a role in these disparities. The implicit association test enables measurement of implicit bias via tests of automatic associations between concepts. Hundreds of studies have examined implicit bias in various settings, but relatively few have been conducted in healthcare. The aim of this systematic review is to synthesize the current knowledge on the role of implicit bias in healthcare disparities. A comprehensive literature search of several databases between May 2015 and September 2016 identified 37 qualifying studies. Of these, 31 found evidence of pro-White or light-skin/anti-Black, Hispanic, American Indian or dark-skin bias among a variety of HCPs across multiple levels of training and disciplines. Fourteen studies examined the association between implicit bias and healthcare outcomes using clinical vignettes or simulated patients. Eight found no statistically significant association between implicit bias and patient care while six studies found that higher implicit bias was associated with disparities in treatment recommendations, expectations of therapeutic bonds, pain management, and empathy. All seven studies that examined the impact of implicit provider bias on real-world patient-provider interaction found that providers with stronger implicit bias demonstrated poorer patient-provider communication. Two studies examined the effect of implicit bias on real-world clinical outcomes. One found an association and the other did not. Two studies tested interventions aimed at reducing bias, but only one found a post-intervention reduction in implicit bias. This review reveals a need for more research exploring implicit bias in real-world patient care, potential modifiers and confounders of the effect of implicit bias on care, and strategies aimed at reducing implicit bias and improving patient-provider communication. Future studies have the opportunity to build on this current body of research, and in doing so will enable us to achieve equity in healthcare and outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Unconscious Motivation. Part I: Implicit Attitudes toward L2 Speakers
ERIC Educational Resources Information Center
Al-Hoorie, Ali H.
2016-01-01
This paper reports the first investigation in the second language acquisition field assessing learners' implicit attitudes using the Implicit Association Test, a computerized reaction-time measure. Examination of the explicit and implicit attitudes of Arab learners of English (N = 365) showed that, particularly for males, implicit attitudes toward…
Perspectives on the Future of CFD
NASA Technical Reports Server (NTRS)
Kwak, Dochan
2000-01-01
This viewgraph presentation gives an overview of the future of computational fluid dynamics (CFD), which in the past has pioneered the field of flow simulation. Over time CFD has progressed as computing power. Numerical methods have been advanced as CPU and memory capacity increases. Complex configurations are routinely computed now and direct numerical simulations (DNS) and large eddy simulations (LES) are used to study turbulence. As the computing resources changed to parallel and distributed platforms, computer science aspects such as scalability (algorithmic and implementation) and portability and transparent codings have advanced. Examples of potential future (or current) challenges include risk assessment, limitations of the heuristic model, and the development of CFD and information technology (IT) tools.
LLVM Infrastructure and Tools Project Summary
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCormick, Patrick Sean
2017-11-06
This project works with the open source LLVM Compiler Infrastructure (http://llvm.org) to provide tools and capabilities that address needs and challenges faced by ECP community (applications, libraries, and other components of the software stack). Our focus is on providing a more productive development environment that enables (i) improved compilation times and code generation for parallelism, (ii) additional features/capabilities within the design and implementations of LLVM components for improved platform/performance portability and (iii) improved aspects related to composition of the underlying implementation details of the programming environment, capturing resource utilization, overheads, etc. -- including runtime systems that are often not easilymore » addressed by application and library developers.« less
Environmental concept for engineering software on MIMD computers
NASA Technical Reports Server (NTRS)
Lopez, L. A.; Valimohamed, K.
1989-01-01
The issues related to developing an environment in which engineering systems can be implemented on MIMD machines are discussed. The problem is presented in terms of implementing the finite element method under such an environment. However, neither the concepts nor the prototype implementation environment are limited to this application. The topics discussed include: the ability to schedule and synchronize tasks efficiently; granularity of tasks; load balancing; and the use of a high level language to specify parallel constructs, manage data, and achieve portability. The objective of developing a virtual machine concept which incorporates solutions to the above issues leads to a design that can be mapped onto loosely coupled, tightly coupled, and hybrid systems.
Portable propellant cutting assembly, and method of cutting propellant with assembly
NASA Technical Reports Server (NTRS)
Sharp, Roger A. (Inventor); Hoskins, Shawn W. (Inventor); Payne, Brett D. (Inventor)
2002-01-01
A propellant cutting assembly and method of using the assembly to cut samples of solid propellant in a repeatable and consistent manner is disclosed. The cutting assembly utilizes two parallel extension beams which are shorter than the diameter of a central bore of an annular solid propellant grain and can be loaded into the central bore. The assembly is equipped with retaining heads at its respective ends and an adjustment mechanism to position and wedge the assembly within the central bore. One end of the assembly is equipped with a cutting blade apparatus which can be extended beyond the end of the extension beams to cut into the solid propellant.
A case for ZnO nanowire field emitter arrays in advanced x-ray source applications
NASA Astrophysics Data System (ADS)
Robinson, Vance S.; Bergkvist, Magnus; Chen, Daokun; Chen, Jun; Huang, Mengbing
2016-09-01
Reviewing current efforts in X-ray source miniaturization reveals a broad spectrum of applications: Portable and/or remote nondestructive evaluation, high throughput protein crystallography, invasive radiotherapy, monitoring fluid flow and particulate generation in situ, and portable radiography devices for battle-front or large scale disaster triage scenarios. For the most part, all of these applications are being addressed with a top-down approach aimed at improving portability, weight and size. That is, the existing system or a critical sub-component is shrunk in some manner in order to miniaturize the overall package. In parallel to top-down x-ray source miniaturization, more recent efforts leverage field emission and semiconductor device fabrication techniques to achieve small scale x-ray sources via a bottom-up approach where phenomena effective at a micro/nanoscale are coordinated for macro-scale effect. The bottom-up approach holds potential to address all the applications previously mentioned but its entitlement extends into new applications with much more ground-breaking potential. One such bottom-up application is the distributed x-ray source platform. In the medical space, using an array of microscale x-ray sources instead of a single source promises significant reductions in patient dose as well as smaller feature detectability and fewer image artifacts. Cold cathode field emitters are ideal for this application because they can be gated electrostatically or via photonic excitation, they do not generate excessive heat like other common electron emitters, they have higher brightness and they are relatively compact. This document describes how ZnO nanowire field emitter arrays are well suited for distributed x-ray source applications because they hold promise in each of the following critical areas: emission stability, simple scalable fabrication, performance, radiation resistance and photonic coupling.
Are implicit self-esteem measures valid for assessing individual and cultural differences?
Falk, Carl F; Heine, Steven J; Takemura, Kosuke; Zhang, Cathy X J; Hsu, Chih-Wei
2015-02-01
Our research utilized two popular theoretical conceptualizations of implicit self-esteem: 1) implicit self-esteem as a global automatic reaction to the self; and 2) implicit self-esteem as a context/domain specific construct. Under this framework, we present an extensive search for implicit self-esteem measure validity among different cultural groups (Study 1) and under several experimental manipulations (Study 2). In Study 1, Euro-Canadians (N = 107), Asian-Canadians (N = 187), and Japanese (N = 112) completed a battery of implicit self-esteem, explicit self-esteem, and criterion measures. Included implicit self-esteem measures were either popular or provided methodological improvements upon older methods. Criterion measures were sampled from previous research on implicit self-esteem and included self-report and independent ratings. In Study 2, Americans (N = 582) completed a shorter battery of these same types of measures under either a control condition, an explicit prime meant to activate the self-concept in a particular context, or prime meant to activate self-competence related implicit attitudes. Across both studies, explicit self-esteem measures far outperformed implicit self-esteem measures in all cultural groups and under all experimental manipulations. Implicit self-esteem measures are not valid for individual or cross-cultural comparisons. We speculate that individuals may not form implicit associations with the self as an attitudinal object. © 2013 Wiley Periodicals, Inc.
Implicit self-esteem decreases in adolescence: a cross-sectional study.
Cai, Huajian; Wu, Mingzheng; Luo, Yu L L; Yang, Jing
2014-01-01
Implicit self-esteem has remained an active research topic in both the areas of implicit social cognition and self-esteem in recent decades. The purpose of this study is to explore the development of implicit self-esteem in adolescents. A total of 599 adolescents from junior and senior high schools in East China participated in the study. They ranged in age from 11 to 18 years with a mean age of 14.10 (SD = 2.16). The degree of implicit self-esteem was assessed using the Implicit Association Test (IAT) with the improved D score as the index. Participants also completed the Rosenberg Self-Esteem Scale (α = 0.77). For all surveyed ages, implicit self-esteem was positively biased, all ts>8.59, all ps<0.001. The simple correlation between implicit self-esteem and age was significant, r = -.25, p = 1. 10(-10). A regression with implicit self-esteem as the criterion variable, and age, gender, and age × gender interaction as predictors further revealed the significant negative linear relationship between age and implicit self-esteem, β = -0.19, t = -3.20, p = 0.001. However, explicit self-esteem manifested a reverse "U" shape throughout adolescence. Implicit self-esteem in adolescence manifests a declining trend with increasing age, suggesting that it is sensitive to developmental or age-related changes. This finding enriches our understanding of the development of implicit social cognition.
Using Implicit Measures to Highlight Science Teachers' Implicit Theories of Intelligence
ERIC Educational Resources Information Center
Mascret, Nicolas; Roussel, Peggy; Cury, François
2015-01-01
Using an innovative method, a Single-Target Implicit Association Test (ST-IAT) was created to explore the implicit theories of intelligence among science and liberal arts teachers and their relationships with their gender. The results showed that for science teachers--especially for male teachers--there was a negative implicit association between…
The Roles of Implicit Understanding of Engineering Ethics in Student Teams' Discussion.
Lee, Eun Ah; Grohman, Magdalena; Gans, Nicholas R; Tacca, Marco; Brown, Matthew J
2017-12-01
Following previous work that shows engineering students possess different levels of understanding of ethics-implicit and explicit-this study focuses on how students' implicit understanding of engineering ethics influences their team discussion process, in cases where there is significant divergence between their explicit and implicit understanding. We observed student teams during group discussions of the ethical issues involved in their engineering design projects. Through the micro-scale discourse analysis based on cognitive ethnography, we found two possible ways in which implicit understanding influenced the discussion. In one case, implicit understanding played the role of intuitive ethics-an intuitive judgment followed by reasoning. In the other case, implicit understanding played the role of ethical insight, emotionally guiding the direction of the discussion. In either case, however, implicit understanding did not have a strong influence, and the conclusion of the discussion reflected students' explicit understanding. Because students' implicit understanding represented broader social implication of engineering design in both cases, we suggest to take account of students' relevant implicit understanding in engineering education, to help students become more socially responsible engineers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt
Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O ondiversecomputer architectures. In our scientific application, thelargestparallel particle simulations generate vast quantitiesofsix-dimensional data. Such a simulation run produces data foranaggregate data size up to several TB per run. Motived by the needtoaddress data I/O and access challenges, we have implemented H5Part,anopen source data I/O API that simplifies the use of the HierarchicalDataFormat v5 library (HDF5). HDF5 is an industry standard forhighperformance, cross-platform data storage and retrieval that runsonall contemporary architectures from large parallel supercomputerstolaptops. H5Part, whichmore » is oriented to the needs of the particlephysicsand cosmology communities, provides support for parallelstorage andretrieval of particles, structured and in the future unstructuredmeshes.In this paper, we describe recent work focusing on I/O supportforparticles and structured meshes and provide data showing performance onmodernsupercomputer architectures like the IBM POWER 5.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solano, M.; Chang, H.; VanDyke, J.
1996-12-31
This paper describes the implementation and results of portable, production-scale 3D Pre-stack Kirchhoff depth migration software. Full volume pre-stack imaging was applied to a six million trace (46.9 Gigabyte) data set from a subsalt play in the Garden Banks area in the Gulf of Mexico. The velocity model building and updating, were derived using image depth gathers and an image-driven strategy. After three velocity iterations, depth migrated sections revealed drilling targets that were not visible in the conventional 3D post-stack time migrated data set. As expected from the implementation of the migration algorithm, it was found that amplitudes are wellmore » preserved and anomalies associated with known reservoirs, conform to petrophysical predictions. Image gathers for velocity analysis and the final depth migrated volume, were generated on an 1824 node Intel Paragon at Sandia National Laboratories. The code has been successfully ported to a CRAY (T3D) and Unix workstation Parallel Virtual Machine environments (PVM).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
WaveJava: Wavelet-based network computing
NASA Astrophysics Data System (ADS)
Ma, Kun; Jiao, Licheng; Shi, Zhuoer
1997-04-01
Wavelet is a powerful theory, but its successful application still needs suitable programming tools. Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi- threaded, dynamic language. This paper addresses the design and development of a cross-platform software environment for experimenting and applying wavelet theory. WaveJava, a wavelet class library designed by the object-orient programming, is developed to take advantage of the wavelets features, such as multi-resolution analysis and parallel processing in the networking computing. A new application architecture is designed for the net-wide distributed client-server environment. The data are transmitted with multi-resolution packets. At the distributed sites around the net, these data packets are done the matching or recognition processing in parallel. The results are fed back to determine the next operation. So, the more robust results can be arrived quickly. The WaveJava is easy to use and expand for special application. This paper gives a solution for the distributed fingerprint information processing system. It also fits for some other net-base multimedia information processing, such as network library, remote teaching and filmless picture archiving and communications.
Weck, Florian; Höfling, Volkmar
2015-01-01
Two adaptations of the Implicit Association Task were used to assess implicit anxiety (IAT-Anxiety) and implicit health attitudes (IAT-Hypochondriasis) in patients with hypochondriasis (n = 58) and anxiety patients (n = 71). Explicit anxieties and health attitudes were assessed using questionnaires. The analysis of several multitrait-multimethod models indicated that the low correlation between explicit and implicit measures of health attitudes is due to the substantial methodological differences between the IAT and the self-report questionnaire. Patients with hypochondriasis displayed significantly more dysfunctional explicit and implicit health attitudes than anxiety patients, but no differences were found regarding explicit and implicit anxieties. The study demonstrates the specificity of explicit and implicit dysfunctional health attitudes among patients with hypochondriasis.
Can implicit appraisal concepts produce emotion-specific effects? A focus on unfairness and anger.
Tong, Eddie M W; Tan, Deborah H; Tan, Yan Lin
2013-06-01
This research examined whether the non-conscious activation of an implicit appraisal concept could affect responses associated with the corresponding emotion as predicted by appraisal theories. Explicit and implicit emotional responses were examined. We focused on implicit unfairness and its effect on anger. The results show that subliminal activation of implicit unfairness affected implicit anger responses (anger facial expression and latency responses to anger words) but not explicit anger feelings (i.e., reported anger). The non-conscious effect of implicit unfairness was specific to anger, as no effect on sadness, fear, and guilt was found. Copyright © 2013 Elsevier Inc. All rights reserved.
The Intergenerational Transmission of Implicit and Explicit Attitudes Toward Smoking
Sherman, Steven J.; Chassin, Laurie; Presson, Clark; Seo, Dong-Chul; Macy, Jonathan T.
2009-01-01
This study examined the intergenerational transmission of implicit and explicit attitudes toward smoking, as well as the role of these attitudes in adolescents’ smoking initiation. There was evidence of intergenerational transmission of implicit attitudes. Mothers who had more positive implicit attitudes had children with more positive implicit attitudes. In turn, these positive implicit attitudes of adolescents predicted their smoking initiation 18-months later. Moreover, these effects were obtained above and beyond the effects of explicit attitudes. These findings provide the first evidence that the intergenerational transmission of implicit cognition may play a role in the intergenerational transmission of an addictive behavior. PMID:20126293
On the nature of implicit soul beliefs: when the past weighs more than the present.
Anglin, Stephanie M
2015-06-01
Intuitive childhood beliefs in dualism may lay the foundation for implicit soul and afterlife beliefs, which may diverge from explicit beliefs formed later in adulthood. Brief Implicit Association Tests were developed to investigate the relation of implicit soul and afterlife beliefs to childhood and current beliefs. Early but not current beliefs covaried with implicit beliefs. Results demonstrated greater discrepancies in current than in childhood soul and afterlife beliefs among religious groups, and no differences in implicit beliefs. These findings suggest that implicit soul and afterlife beliefs diverge from current self-reported beliefs, stemming instead from childhood beliefs. © 2014 The British Psychological Society.
NASA Technical Reports Server (NTRS)
Reuther, James; Alonso, Juan Jose; Rimlinger, Mark J.; Jameson, Antony
1996-01-01
This work describes the application of a control theory-based aerodynamic shape optimization method to the problem of supersonic aircraft design. The design process is greatly accelerated through the use of both control theory and a parallel implementation on distributed memory computers. Control theory is employed to derive the adjoint differential equations whose solution allows for the evaluation of design gradient information at a fraction of the computational cost required by previous design methods. The resulting problem is then implemented on parallel distributed memory architectures using a domain decomposition approach, an optimized communication schedule, and the MPI (Message Passing Interface) Standard for portability and efficiency. The final result achieves very rapid aerodynamic design based on higher order computational fluid dynamics methods (CFD). In our earlier studies, the serial implementation of this design method was shown to be effective for the optimization of airfoils, wings, wing-bodies, and complex aircraft configurations using both the potential equation and the Euler equations. In our most recent paper, the Euler method was extended to treat complete aircraft configurations via a new multiblock implementation. Furthermore, during the same conference, we also presented preliminary results demonstrating that this basic methodology could be ported to distributed memory parallel computing architectures. In this paper, our concern will be to demonstrate that the combined power of these new technologies can be used routinely in an industrial design environment by applying it to the case study of the design of typical supersonic transport configurations. A particular difficulty of this test case is posed by the propulsion/airframe integration.
Changes of Explicit and Implicit Stigma in Medical Students during Psychiatric Clerkship.
Wang, Peng-Wei; Ko, Chih-Hung; Chen, Cheng-Sheng; Yang, Yi-Hsin Connine; Lin, Huang-Chi; Cheng, Cheng-Chung; Tsang, Hin-Yeung; Wu, Ching-Kuan; Yen, Cheng-Fang
2016-04-01
This study examines the differences in explicit and implicit stigma between medical and non-medical undergraduate students at baseline; the changes of explicit and implicit stigma in medical undergraduate and non-medical undergraduate students after a 1-month psychiatric clerkship and 1-month follow-up period; and the differences in the changes of explicit and implicit stigma between medical and non-medical undergraduate students. Seventy-two medical undergraduate students and 64 non-medical undergraduate students were enrolled. All participants were interviewed at intake and after 1 month. The Taiwanese version of the Stigma Assessment Scale and the Implicit Association Test were used to measure the participants' explicit and implicit stigma. Neither explicit nor implicit stigma differed between two groups at baseline. The medical, but not the non-medical, undergraduate students had a significant decrease in explicit stigma during the 1-month period of follow-up. Neither the medical nor the non-medical undergraduate students exhibited a significant change in implicit stigma during the one-month of follow-up, however. There was an interactive effect between group and time on explicit stigma but not on implicit stigma. Explicit but not implicit stigma toward mental illness decreased in the medical undergraduate students after a psychiatric clerkship. Further study is needed to examine how to improve implicit stigma toward mental illness.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale
NASA Astrophysics Data System (ADS)
Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.
2016-12-01
Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
On a model of three-dimensional bursting and its parallel implementation
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.
2008-04-01
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
The firehose instability during multiple reconnection in the Earth's magnetotail
NASA Astrophysics Data System (ADS)
Alexandrova, Alexandra; Divin, Andrey; Retino, Alessandro; Deca, Jan; Catapano, Filomena; Cozzani, Giulia
2017-04-01
We found unique events in the Cluster spacecraft observations of the Earth's magnetotail which correspond to the case of multiple reconnection sites. The ion temperature anisotropy of more energized ions in the direction parallel to the magnetic field, rather than in the perpendicular direction, is observed in the region of dynamical interaction between two active X-lines. The magnetic field and plasma parameters associated with the anisotropy correspond to the firehose instability conditions. We discuss possible scenarios of development of the firehose instability in multiple reconnection by comparing the observations with numerical simulations. Conventional Particle-in-Cell simulations of 2D magnetic reconnection starting from Harris equilibria are performed using implicit PIC code iPIC3D [Markidis, 2010]. At earlier stages the evolution creates fronts which push the weakly magnetized current sheet plasma away from the X-line. Fronts accelerate and reflect particles, producing parallel ion beams and increasing parallel ion temperature ahead of the front. If multiple X-lines are present, then the counterstreaming ion beams appear inside the original current sheet between colliding reconnection jet fronts. For large enough parallel ion pressure anisotropy, the firehose-like mode is excited inside the original current sheet with a flapping-like appearance along the X GSM direction but not Y GSM (current) direction. One should note that our simulations do not include the Bz magnetic field component (normal to the current sheet), hence ion beams cannot escape into the lobes and the whole region between two colliding fronts is unstable to firehose-like instability. In the Earth's magnetotail such configuration likely occurs when two active X-lines are close enough to each other, similar to a few cases we found in the Cluster observations.
Intact implicit learning in autism spectrum conditions.
Brown, Jamie; Aczel, Balazs; Jiménez, Luis; Kaufman, Scott Barry; Grant, Kate Plaisted
2010-09-01
Individuals with autism spectrum condition (ASC) have diagnostic impairments in skills that are associated with an implicit acquisition; however, it is not clear whether ASC individuals show specific implicit learning deficits. We compared ASC and typically developing (TD) individuals matched for IQ on five learning tasks: four implicit learning tasks--contextual cueing, serial reaction time, artificial grammar learning, and probabilistic classification learning tasks--that used procedures expressly designed to minimize the use of explicit strategies, and one comparison explicit learning task, paired associates learning. We found implicit learning to be intact in ASC. Beyond no evidence of differences, there was evidence of statistical equivalence between the groups on all the implicit learning tasks. This was not a consequence of compensation by explicit learning ability or IQ. Furthermore, there was no evidence to relate implicit learning to ASC symptomatology. We conclude that implicit mechanisms are preserved in ASC and propose that it is disruption by other atypical processes that impact negatively on the development of skills associated with an implicit acquisition.
Berry, Tanya R; Rodgers, Wendy M; Divine, Alison; Hall, Craig
2018-06-19
Discrepancies between automatically activated associations (i.e., implicit evaluations) and explicit evaluations of motives (measured with a questionnaire) could lead to greater information processing to resolve discrepancies or self-regulatory failures that may affect behavior. This research examined the relationship of health and appearance exercise-related explicit-implicit evaluative discrepancies, the interaction between implicit and explicit evaluations, and the combined value of explicit and implicit evaluations (i.e., the summed scores) to dropout from a yearlong exercise program. Participants (N = 253) completed implicit health and appearance measures and explicit health and appearance motives at baseline, prior to starting the exercise program. The sum of implicit and explicit appearance measures was positively related to weeks in the program, and discrepancy between the implicit and explicit health measures was negatively related to length of time in the program. Implicit exercise evaluations and their relationships to oft-cited motives such as appearance and health may inform exercise dropout.
Implicit Self-Esteem Decreases in Adolescence: A Cross-Sectional Study
Cai, Huajian; Wu, Mingzheng; Luo, Yu L. L.; Yang, Jing
2014-01-01
Implicit self-esteem has remained an active research topic in both the areas of implicit social cognition and self-esteem in recent decades. The purpose of this study is to explore the development of implicit self-esteem in adolescents. A total of 599 adolescents from junior and senior high schools in East China participated in the study. They ranged in age from 11 to 18 years with a mean age of 14.10 (SD = 2.16). The degree of implicit self-esteem was assessed using the Implicit Association Test (IAT) with the improved D score as the index. Participants also completed the Rosenberg Self-Esteem Scale (α = 0.77). For all surveyed ages, implicit self-esteem was positively biased, all ts>8.59, all ps<0.001. The simple correlation between implicit self-esteem and age was significant, r = −.25, p = 1.0×10−10. A regression with implicit self-esteem as the criterion variable, and age, gender, and age × gender interaction as predictors further revealed the significant negative linear relationship between age and implicit self-esteem, β = −0.19, t = −3.20, p = 0.001. However, explicit self-esteem manifested a reverse “U” shape throughout adolescence. Implicit self-esteem in adolescence manifests a declining trend with increasing age, suggesting that it is sensitive to developmental or age-related changes. This finding enriches our understanding of the development of implicit social cognition. PMID:24587169
Towards an explicit account of implicit learning.
Forkstam, Christian; Petersson, Karl Magnus
2005-08-01
The human brain supports acquisition mechanisms that can extract structural regularities implicitly from experience without the induction of an explicit model. Reber defined the process by which an individual comes to respond appropriately to the statistical structure of the input ensemble as implicit learning. He argued that the capacity to generalize to new input is based on the acquisition of abstract representations that reflect underlying structural regularities in the acquisition input. We focus this review of the implicit learning literature on studies published during 2004 and 2005. We will not review studies of repetition priming ('implicit memory'). Instead we focus on two commonly used experimental paradigms: the serial reaction time task and artificial grammar learning. Previous comprehensive reviews can be found in Seger's 1994 article and the Handbook of Implicit Learning. Emerging themes include the interaction between implicit and explicit processes, the role of the medial temporal lobe, developmental aspects of implicit learning, age-dependence, the role of sleep and consolidation. The attempts to characterize the interaction between implicit and explicit learning are promising although not well understood. The same can be said about the role of sleep and consolidation. Despite the fact that lesion studies have relatively consistently suggested that the medial temporal lobe memory system is not necessary for implicit learning, a number of functional magnetic resonance studies have reported medial temporal lobe activation in implicit learning. This issue merits further research. Finally, the clinical relevance of implicit learning remains to be determined.
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Parallelization and checkpointing of GPU applications through program transformation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solano-Quinde, Lizandro Damian
2012-01-01
GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
Xia, Yidong; Lou, Jialin; Luo, Hong; ...
2015-02-09
Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less
Ultralow-Power Electronic Trapping of Nanoparticles with Sub-10 nm Gold Nanogap Electrodes.
Barik, Avijit; Chen, Xiaoshu; Oh, Sang-Hyun
2016-10-12
We demonstrate nanogap electrodes for rapid, parallel, and ultralow-power trapping of nanoparticles. Our device pushes the limit of dielectrophoresis by shrinking the separation between gold electrodes to sub-10 nm, thereby creating strong trapping forces at biases as low as the 100 mV ranges. Using high-throughput atomic layer lithography, we manufacture sub-10 nm gaps between 0.8 mm long gold electrodes and pattern them into individually addressable parallel electronic traps. Unlike pointlike junctions made by electron-beam lithography or larger micron-gap electrodes that are used for conventional dielectrophoresis, our sub-10 nm gold nanogap electrodes provide strong trapping forces over a mm-scale trapping zone. Importantly, our technology solves the key challenges associated with traditional dielectrophoresis experiments, such as high voltages that cause heat generation, bubble formation, and unwanted electrochemical reactions. The strongly enhanced fields around the nanogap induce particle-transport speed exceeding 10 μm/s and enable the trapping of 30 nm polystyrene nanoparticles using an ultralow bias of 200 mV. We also demonstrate rapid electronic trapping of quantum dots and nanodiamond particles on arrays of parallel traps. Our sub-10 nm gold nanogap electrodes can be combined with plasmonic sensors or nanophotonic circuitry, and their low-power electronic operation can potentially enable high-density integration on a chip as well as portable biosensing.
Trendel, Olivier; Werle, Carolina O C
2016-09-01
Eating behaviors largely result from automatic processes. Yet, in existing research, automatic or implicit attitudes toward food often fail to predict eating behaviors. Applying findings in cognitive neuroscience research, we propose and find that a central reason why implicit attitudes toward food are not good predictors of eating behaviors is that implicit attitudes are driven by two distinct constructs that often have diverging evaluative consequences: the automatic affective reactions to food (e.g., tastiness; the affective basis of implicit attitudes) and the automatic cognitive reactions to food (e.g., healthiness; the cognitive basis of implicit attitudes). More importantly, we find that the affective and cognitive bases of implicit attitudes directly and uniquely influence actual food choices under different conditions. While the affective basis of implicit attitude is the main driver of food choices, it is the only driver when cognitive resources during choice are limited. The cognitive basis of implicit attitudes uniquely influences food choices when cognitive resources during choice are plentiful but only for participants low in impulsivity. Researchers interested in automatic processes in eating behaviors could thus benefit by distinguishing between the affective and cognitive bases of implicit attitudes. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
McDermott, K B; Roediger, H L
1996-03-01
Three experiments examined whether a conceptual implicit memory test (specifically, category instance generation) would exhibit repetition effects similar to those found in free recall. The transfer appropriate processing account of dissociations among memory tests led us to predict that the tests would show parallel effects; this prediction was based upon the theory's assumption that conceptual tests will behave similarly as a function of various independent variables. In Experiment 1, conceptual repetition (i.e., following a target word [e.g., puzzles] with an associate [e.g., jigsaw]) did not enhance priming on the instance generation test relative to the condition of simply presenting the target word once, although this manipulation did affect free recall. In Experiment 2, conceptual repetition was achieved by following a picture with its corresponding word (or vice versa). In this case, there was an effect of conceptual repetition on free recall but no reliable effect on category instance generation or category cued recall. In addition, we obtained a picture superiority effect in free recall but not in category instance generation. In the third experiment, when the same study sequence was used as in Experiment 1, but with instructions that encouraged relational processing, priming on the category instance generation task was enhanced by conceptual repetition. Results demonstrate that conceptual memory tests can be dissociated and present problems for Roediger's (1990) transfer appropriate processing account of dissociations between explicit and implicit tests.
Autistic Traits Moderate the Impact of Reward Learning on Social Behaviour.
Panasiti, Maria Serena; Puzzo, Ignazio; Chakrabarti, Bhismadev
2016-04-01
A deficit in empathy has been suggested to underlie social behavioural atypicalities in autism. A parallel theoretical account proposes that reduced social motivation (i.e., low responsivity to social rewards) can account for the said atypicalities. Recent evidence suggests that autistic traits modulate the link between reward and proxy metrics related to empathy. Using an evaluative conditioning paradigm to associate high and low rewards with faces, a previous study has shown that individuals high in autistic traits show reduced spontaneous facial mimicry of faces associated with high vs. low reward. This observation raises the possibility that autistic traits modulate the magnitude of evaluative conditioning. To test this, we investigated (a) if autistic traits could modulate the ability to implicitly associate a reward value to a social stimulus (reward learning/conditioning, using the Implicit Association Task, IAT); (b) if the learned association could modulate participants' prosocial behaviour (i.e., social reciprocity, measured using the cyberball task); (c) if the strength of this modulation was influenced by autistic traits. In 43 neurotypical participants, we found that autistic traits moderated the relationship of social reward learning on prosocial behaviour but not reward learning itself. This evidence suggests that while autistic traits do not directly influence social reward learning, they modulate the relationship of social rewards with prosocial behaviour. © 2015 The Authors Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research.
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.; ...
2015-07-30
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
‘Alzheimer’s Progression Score’: Development of a Biomarker Summary Outcome for AD Prevention Trials
Leoutsakos, J.-M.; Gross, A.L.; Jones, R.N.; Albert, M.S.; Breitner, J.C.S.
2017-01-01
BACKGROUND Alzheimer’s disease (AD) prevention research requires methods for measurement of disease progression not yet revealed by symptoms. Preferably, such measurement should encompass multiple disease markers. OBJECTIVES Evaluate an item response theory (IRT) model-based latent variable Alzheimer Progression Score (APS) that uses multi-modal disease markers to estimate pre-clinical disease progression. DESIGN Estimate APS scores in the BIOCARD observational study, and in the parallel PREVENT-AD Cohort and its sister INTREPAD placebo-controlled prevention trial. Use BIOCARD data to evaluate whether baseline and early APS trajectory predict later progression to MCI/dementia. Similarly, use longitudinal PREVENT-AD data to assess test measurement invariance over time. Further, assess portability of the PREVENT-AD IRT model to baseline INTREPAD data, and explore model changes when CSF markers are added or withdrawn. SETTING BIOCARD was established in 1995 and participants were followed up to 20 years in Baltimore, USA. The PREVENT-AD and INTREPAD trial cohorts were established between 2011–2015 in Montreal, Canada, using nearly identical entry criteria to enroll high-risk cognitively normal persons aged 60+ then followed for several years. PARTICIPANTS 349 cognitively normal, primarily middle-aged participants in BIOCARD, 125 high-risk participants aged 60+ in PREVENT-AD, and 217 similar subjects in INTREPAD. 106 INTREPAD participants donated up to four serial CSF samples. MEASUREMENTS Global cognitive assessment and multiple structural, functional, and diffusion MRI metrics, sensori-neural tests, and CSF concentrations of tau, Aβ42 and their ratio. RESULTS Both baseline values and early slope of APS scores in BIOCARD predicted later progression to MCI or AD. Presence of CSF variables strongly improved such prediction. A similarly derived APS in PREVENT-AD showed measurement invariance over time and portability to the parallel INTREPAD sample. CONCLUSIONS An IRT-based APS can summarize multimodal information to provide a longitudinal measure of pre-clinical AD progression, and holds promise as an outcome for AD prevention trials. PMID:29034223
Leoutsakos, J-M; Gross, A L; Jones, R N; Albert, M S; Breitner, J C S
2016-01-01
Alzheimer's disease (AD) prevention research requires methods for measurement of disease progression not yet revealed by symptoms. Preferably, such measurement should encompass multiple disease markers. Evaluate an item response theory (IRT) model-based latent variable Alzheimer Progression Score (APS) that uses multi-modal disease markers to estimate pre-clinical disease progression. Estimate APS scores in the BIOCARD observational study, and in the parallel PREVENT-AD Cohort and its sister INTREPAD placebo-controlled prevention trial. Use BIOCARD data to evaluate whether baseline and early APS trajectory predict later progression to MCI/dementia. Similarly, use longitudinal PREVENT-AD data to assess test measurement invariance over time. Further, assess portability of the PREVENT-AD IRT model to baseline INTREPAD data, and explore model changes when CSF markers are added or withdrawn. BIOCARD was established in 1995 and participants were followed up to 20 years in Baltimore, USA. The PREVENT-AD and INTREPAD trial cohorts were established between 2011-2015 in Montreal, Canada, using nearly identical entry criteria to enroll high-risk cognitively normal persons aged 60+ then followed for several years. 349 cognitively normal, primarily middle-aged participants in BIOCARD, 125 high-risk participants aged 60+ in PREVENT-AD, and 217 similar subjects in INTREPAD. 106 INTREPAD participants donated up to four serial CSF samples. Global cognitive assessment and multiple structural, functional, and diffusion MRI metrics, sensori-neural tests, and CSF concentrations of tau, Aβ42 and their ratio. Both baseline values and early slope of APS scores in BIOCARD predicted later progression to MCI or AD. Presence of CSF variables strongly improved such prediction. A similarly derived APS in PREVENT-AD showed measurement invariance over time and portability to the parallel INTREPAD sample. An IRT-based APS can summarize multimodal information to provide a longitudinal measure of pre-clinical AD progression, and holds promise as an outcome for AD prevention trials.
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
NASA Astrophysics Data System (ADS)
Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin
2015-01-01
A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell)more » approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.« less
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Numerical simulation of weakly ionized hypersonic flow over reentry capsules
NASA Astrophysics Data System (ADS)
Scalabrin, Leonardo C.
The mathematical and numerical formulation employed in the development of a new multi-dimensional Computational Fluid Dynamics (CFD) code for the simulation of weakly ionized hypersonic flows in thermo-chemical non-equilibrium over reentry configurations is presented. The flow is modeled using the Navier-Stokes equations modified to include finite-rate chemistry and relaxation rates to compute the energy transfer between different energy modes. The set of equations is solved numerically by discretizing the flowfield using unstructured grids made of any mixture of quadrilaterals and triangles in two-dimensions or hexahedra, tetrahedra, prisms and pyramids in three-dimensions. The partial differential equations are integrated on such grids using the finite volume approach. The fluxes across grid faces are calculated using a modified form of the Steger-Warming Flux Vector Splitting scheme that has low numerical dissipation inside boundary layers. The higher order extension of inviscid fluxes in structured grids is generalized in this work to be used in unstructured grids. Steady state solutions are obtained by integrating the solution over time implicitly. The resulting sparse linear system is solved by using a point implicit or by a line implicit method in which a tridiagonal matrix is assembled by using lines of cells that are formed starting at the wall. An algorithm that assembles these lines using completely general unstructured grids is developed. The code is parallelized to allow simulation of computationally demanding problems. The numerical code is successfully employed in the simulation of several hypersonic entry flows over space capsules as part of its validation process. Important quantities for the aerothermodynamics design of capsules such as aerodynamic coefficients and heat transfer rates are compared to available experimental and flight test data and other numerical results yielding very good agreement. A sensitivity analysis of predicted radiative heating of a space capsule to several thermo-chemical non-equilibrium models is also performed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Guangye; Chacon, Luis; Barnes, Daniel C
2012-01-01
Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been developed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230, 18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver and is capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle orbit integrations from the field solver, while remaining fully self-consistent. This provides great flexibility, and dramatically improves the solver efficiency by reducing the degrees of freedom of the associated nonlinear system. However, it requires a particle push per nonlinearmore » residual evaluation, which makes the particle push the most time-consuming operation in the algorithm. This paper describes a very efficient mixed-precision, hybrid CPU-GPU implementation of the implicit PIC algorithm. The JFNK solver is kept on the CPU (in double precision), while the inherent data parallelism of the particle mover is exploited by implementing it in single-precision on a graphics processing unit (GPU) using CUDA. Performance-oriented optimizations, with the aid of an analytical performance model, the roofline model, are employed. Despite being highly dynamic, the adaptive, charge-conserving particle mover algorithm achieves up to 300 400 GOp/s (including single-precision floating-point, integer, and logic operations) on a Nvidia GeForce GTX580, corresponding to 20 25% absolute GPU efficiency (against the peak theoretical performance) and 50-70% intrinsic efficiency (against the algorithm s maximum operational throughput, which neglects all latencies). This is about 200-300 times faster than an equivalent serial CPU implementation. When the single-precision GPU particle mover is combined with a double-precision CPU JFNK field solver, overall performance gains 100 vs. the double-precision CPU-only serial version are obtained, with no apparent loss of robustness or accuracy when applied to a challenging long-time scale ion acoustic wave simulation.« less
NASA Astrophysics Data System (ADS)
Kim, Cheolhwan; Kim, Kyu-Jung; Ha, Man Yeong
To investigate the possibility of the portable application of a direct borohydride fuel cell (DBFC), weight reduction of the stack and high stacking of the cells are investigated for practical running conditions. For weight reduction, carbon graphite is adopted as the bipolar plate material even though it has disadvantages in tight stacking, which results in stacking loss from insufficient material strength. For high stacking, it is essential to have a uniform fuel distribution among cells and channels to maintain equal electric load on each cell. In particular, the design of the anode channel is important because active hydrogen generation causes non-uniformity in the fuel flow-field of the cells and channels. To reduce the disadvantages of stacking force margin and fuel maldistribution, an O-ring type-sealing system with an internal manifold and a parallel anode channel design is adopted, and the characteristics of a single and a five-cell fuel cell stack are analyzed. By adopting carbon graphite, the stack weight can be reduced by 4.2 times with 12% of performance degradation from the insufficient stacking force. When cells are stacked, the performance exceeds the single-cell performance because of the stack temperature increase from the reduction of the radiation area from the narrow stacking of cells.
On-road particle number measurements using a portable emission measurement system (PEMS)
NASA Astrophysics Data System (ADS)
Gallus, Jens; Kirchner, Ulf; Vogt, Rainer; Börensen, Christoph; Benter, Thorsten
2016-01-01
In this study the on-road particle number (PN) performance of a Euro-5 direct-injection (DI) gasoline passenger car was investigated. PN emissions were measured using the prototype of a portable emission measurement system (PEMS). PN PEMS correlations with chassis dynamometer tests show a good agreement with a chassis dynamometer set-up down to emissions in the range of 1·1010 #/km. Parallel on-line soot measurements by a photo acoustic soot sensor (PASS) were applied as independent measurement technique and indicate a good on-road performance for the PN-PEMS. PN-to-soot ratios were 1.3·1012 #/mg, which was comparable for both test cell and on-road measurements. During on-road trips different driving styles as well as different road types were investigated. Comparisons to the world harmonized light-duty test cycle (WLTC) 5.3 and to European field operational test (euroFOT) data indicate the PEMS trips to be representative for normal driving. Driving situations in varying traffic seem to be a major contributor to a high test-to-test variability of PN emissions. However, there is a trend to increasing PN emissions with more severe driving styles. A cold start effect is clearly visible for PN, especially at low ambient temperatures down to 8 °C.
Suslow, Thomas; Lindner, Christian; Kugel, Harald; Egloff, Boris; Schmukle, Stefan C
2014-08-30
There is evidence from research based on self-report personality measures that schizophrenia patients tend to be lower in extraversion and higher in neuroticism than healthy individuals. Self-report personality measures assess aspects of the explicit self-concept. The Implicit Association Test (IAT) has been developed to assess aspects of implicit cognition such as implicit attitudes and implicit personality traits. The present study was conducted to investigate the applicability and reliability of the IAT in schizophrenia patients and test whether they differ from healthy individuals on implicitly measured extraversion and neuroticism. The IAT and the NEO-FFI were administered as implicit and explicit measures of extraversion and neuroticism to 34 schizophrenia patients and 45 healthy subjects. For all IAT scores satisfactory to good reliabilities were observed in the patient sample. In both study groups, IAT scores were not related to NEO-FFI scores. Schizophrenia patients were lower in implicit and explicit extraversion and higher in implicit and explicit neuroticism than healthy individuals. Our data show that the IAT can be reliably applied to schizophrenia patients and suggest that they differ from healthy individuals not only in their conscious representation but also in their implicit representation of the self with regard to neuroticism and extraversion-related characteristics. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Implicit Associations and Explicit Expectancies toward Cannabis in Heavy Cannabis Users and Controls
Beraha, Esther M.; Cousijn, Janna; Hermanides, Elisa; Goudriaan, Anna E.; Wiers, Reinout W.
2013-01-01
Cognitive biases, including implicit memory associations are thought to play an important role in the development of addictive behaviors. The aim of the present study was to investigate implicit affective memory associations in heavy cannabis users. Implicit positive-arousal, sedation, and negative associations toward cannabis were measured with three Single Category Implicit Association Tests (SC-IAT’s) and compared between 59 heavy cannabis users and 89 controls. Moreover, we investigated the relationship between these implicit affective associations and explicit expectancies, subjective craving, cannabis use, and cannabis related problems. Results show that heavy cannabis users had stronger implicit positive-arousal associations but weaker implicit negative associations toward cannabis compared to controls. Moreover, heavy cannabis users had stronger sedation but weaker negative explicit expectancies toward cannabis compared to controls. Within heavy cannabis users, more cannabis use was associated with stronger implicit negative associations whereas more cannabis use related problems was associated with stronger explicit negative expectancies, decreasing the overall difference on negative associations between cannabis users and controls. No other associations were observed between implicit associations, explicit expectancies, measures of cannabis use, cannabis use related problems, or subjective craving. These findings indicate that, in contrast to other substances of abuse like alcohol and tobacco, the relationship between implicit associations and cannabis use appears to be weak in heavy cannabis users. PMID:23801968
Implicit and explicit ethnocentrism: revisiting the ideologies of prejudice.
Cunningham, William A; Nezlek, John B; Banaji, Mahzarin R
2004-10-01
Two studies investigated relationships among individual differences in implicit and explicit prejudice, right-wing ideology, and rigidity in thinking. The first study examined these relationships focusing on White Americans' prejudice toward Black Americans. The second study provided the first test of implicit ethnocentrism and its relationship to explicit ethnocentrism by studying the relationship between attitudes toward five social groups. Factor analyses found support for both implicit and explicit ethnocentrism. In both studies, mean explicit attitudes toward out groups were positive, whereas implicit attitudes were negative, suggesting that implicit and explicit prejudices are distinct; however, in both studies, implicit and explicit attitudes were related (r = .37, .47). Latent variable modeling indicates a simple structure within this ethnocentric system, with variables organized in order of specificity. These results lead to the conclusion that (a) implicit ethnocentrism exists and (b) it is related to and distinct from explicit ethnocentrism.
Gambling and Sport: Implicit Association and Explicit Intention Among Underage Youth.
Li, En; Langham, Erika; Browne, Matthew; Rockloff, Matthew; Thorne, Hannah
2018-03-23
This study examined whether an implicit association existed between gambling and sport among underage youth in Australia, and whether this implicit association could shape their explicit intention to gamble. A sample of 14-17 year old Australian participants completed two phases of tasks, including an implicit association test based online experiment, and a post-experiment online survey. The results supported the existence of an implicit association between gambling and sport among the participants. This implicit association became stronger when they saw sport-relevant (vs. sport-irrelevant) gambling logos, or gambling-relevant (vs. gambling-irrelevant) sport names. In addition, this implicit association was positively related to the amount of sport viewing, but only among those participants who had more favorable gambling attitudes. Lastly, gambling attitudes and advertising knowledge, rather than the implicit association, turned out to be significant predictors of the explicit intention to gamble.
Smith, Colin Tucker; De Houwer, Jan
2015-01-01
Because implicit evaluations are thought to underlie many aspects of behavior, researchers have started looking for ways to change them. We examine whether and when persuasive messages alter strongly held implicit evaluations of smoking. In smokers, an affective anti-smoking message led to more negative implicit evaluations on four different implicit measures as compared to a cognitive anti-smoking message which seemed to backfire. Additional analyses suggested that the observed effects were mediated by the feelings and emotions raised by the messages. In non-smokers, both the affective and cognitive message engendered slightly more negative implicit evaluations. We conclude that persuasive messages change implicit evaluations in a way that depends on properties of the message and of the participant. Thus, our data open new avenues for research directed at tailoring persuasive messages to change implicit evaluations. PMID:26557099
Smith, Colin Tucker; De Houwer, Jan
2015-01-01
Because implicit evaluations are thought to underlie many aspects of behavior, researchers have started looking for ways to change them. We examine whether and when persuasive messages alter strongly held implicit evaluations of smoking. In smokers, an affective anti-smoking message led to more negative implicit evaluations on four different implicit measures as compared to a cognitive anti-smoking message which seemed to backfire. Additional analyses suggested that the observed effects were mediated by the feelings and emotions raised by the messages. In non-smokers, both the affective and cognitive message engendered slightly more negative implicit evaluations. We conclude that persuasive messages change implicit evaluations in a way that depends on properties of the message and of the participant. Thus, our data open new avenues for research directed at tailoring persuasive messages to change implicit evaluations.
Senese, Vincenzo Paolo; Venuti, Paola; Giordano, Francesca; Napolitano, Maria; Esposito, Gianluca; Bornstein, Marc H
2017-09-01
In this study a novel auditory version of the Single Category Implicit Association Test (SC-IAT-A) was developed to investigate (a) the valence of adults' associations to infant cries and laughs, (b) moderation of implicit associations by gender and empathy, and (c) the robustness of implicit associations controlling for auditory sensitivity. Eighty adults (50% females) were administered two SC-IAT-As, the Empathy Quotient, and the Weinstein Noise Sensitivity Scale. Adults showed positive implicit associations to infant laugh and negative ones to infant cry; only the implicit associations with the infant laugh were negatively related to empathy scores, and no gender differences were observed. Finally, implicit associations to infant cry were affected by noise sensitivity. The SC-IAT-A is useful to evaluate the valence of implicit reactions to infant auditory cues and could provide fresh insights into understanding processes that regulate the quality of adult-infant relationships.
Self-Regulation and Implicit Attitudes Toward Physical Activity Influence Exercise Behavior.
Padin, Avelina C; Emery, Charles F; Vasey, Michael; Kiecolt-Glaser, Janice K
2017-08-01
Dual-process models of health behavior posit that implicit and explicit attitudes independently drive healthy behaviors. Prior evidence indicates that implicit attitudes may be related to weekly physical activity (PA) levels, but the extent to which self-regulation attenuates this link remains unknown. This study examined the associations between implicit attitudes and self-reported PA during leisure time among 150 highly active young adults and evaluated the extent to which effortful control (one aspect of self-regulation) moderated this relationship. Results indicated that implicit attitudes toward exercise were unrelated to average workout length among individuals with higher effortful control. However, those with lower effortful control and more negative implicit attitudes reported shorter average exercise sessions compared with those with more positive attitudes. Implicit and explicit attitudes were unrelated to total weekly PA. A combination of poorer self-regulation and negative implicit attitudes may leave individuals vulnerable to mental and physical health consequences of low PA.
ERIC Educational Resources Information Center
Barnes-Holmes, Dermot; Murtagh, Louise; Barnes-Holmes, Yvonne; Stewart, Ian
2010-01-01
The current study aimed to assess the implicit attitudes of vegetarians and non-vegetarians towards meat and vegetables, using the Implicit Association Test (IAT) and the Implicit Relational Assessment Procedure (IRAP). Both measures involved asking participants to respond, under time pressure, to pictures of meat or vegetables as either positive…
NASA Astrophysics Data System (ADS)
Stellmach, Stephan; Hansen, Ulrich
2008-05-01
Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
Parallelisation study of a three-dimensional environmental flow model
NASA Astrophysics Data System (ADS)
O'Donncha, Fearghal; Ragnoli, Emanuele; Suits, Frank
2014-03-01
There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.
Efficient iterative methods applied to the solution of transonic flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.
1996-02-01
We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
A New Numerical Scheme for Cosmic-Ray Transport
NASA Astrophysics Data System (ADS)
Jiang, Yan-Fei; Oh, S. Peng
2018-02-01
Numerical solutions of the cosmic-ray (CR) magnetohydrodynamic equations are dogged by a powerful numerical instability, which arises from the constraint that CRs can only stream down their gradient. The standard cure is to regularize by adding artificial diffusion. Besides introducing ad hoc smoothing, this has a significant negative impact on either computational cost or complexity and parallel scalings. We describe a new numerical algorithm for CR transport, with close parallels to two-moment methods for radiative transfer under the reduced speed of light approximation. It stably and robustly handles CR streaming without any artificial diffusion. It allows for both isotropic and field-aligned CR streaming and diffusion, with arbitrary streaming and diffusion coefficients. CR transport is handled explicitly, while source terms are handled implicitly. The overall time step scales linearly with resolution (even when computing CR diffusion) and has a perfect parallel scaling. It is given by the standard Courant condition with respect to a constant maximum velocity over the entire simulation domain. The computational cost is comparable to that of solving the ideal MHD equation. We demonstrate the accuracy and stability of this new scheme with a wide variety of tests, including anisotropic streaming and diffusion tests, CR-modified shocks, CR-driven blast waves, and CR transport in multiphase media. The new algorithm opens doors to much more ambitious and hitherto intractable calculations of CR physics in galaxies and galaxy clusters. It can also be applied to other physical processes with similar mathematical structure, such as saturated, anisotropic heat conduction.
Merritt, Stephanie M; Heimbaugh, Heather; LaChapell, Jennifer; Lee, Deborah
2013-06-01
This study is the first to examine the influence of implicit attitudes toward automation on users' trust in automation. Past empirical work has examined explicit (conscious) influences on user level of trust in automation but has not yet measured implicit influences. We examine concurrent effects of explicit propensity to trust machines and implicit attitudes toward automation on trust in an automated system. We examine differential impacts of each under varying automation performance conditions (clearly good, ambiguous, clearly poor). Participants completed both a self-report measure of propensity to trust and an Implicit Association Test measuring implicit attitude toward automation, then performed an X-ray screening task. Automation performance was manipulated within-subjects by varying the number and obviousness of errors. Explicit propensity to trust and implicit attitude toward automation did not significantly correlate. When the automation's performance was ambiguous, implicit attitude significantly affected automation trust, and its relationship with propensity to trust was additive: Increments in either were related to increases in trust. When errors were obvious, a significant interaction between the implicit and explicit measures was found, with those high in both having higher trust. Implicit attitudes have important implications for automation trust. Users may not be able to accurately report why they experience a given level of trust. To understand why users trust or fail to trust automation, measurements of implicit and explicit predictors may be necessary. Furthermore, implicit attitude toward automation might be used as a lever to effectively calibrate trust.
Using the Implicit Association Test to Assess Children's Implicit Attitudes toward Smoking.
Andrews, Judy A; Hampson, Sarah E; Greenwald, Anthony G; Gordon, Judith; Widdop, Chris
2010-09-01
The development and psychometric properties of an Implicit Association Test (IAT) measuring implicit attitude toward smoking among fifth grade children were described. The IAT with "sweets" as the contrast category resulted in higher correlations with explicit attitudes than did the IAT with "healthy foods" as the contrast category. Children with family members who smoked (versus non-smoking) and children who were high in sensation seeking (versus low) had a significantly more favorable implicit attitude toward smoking. Further, implicit attitudes became less favorable after engaging in tobacco prevention activities targeting risk perceptions of addiction. Results support the reliability and validity of this version of the IAT and illustrate its usefulness in assessing young children's implicit attitude toward smoking.
Using the Implicit Association Test to Assess Children's Implicit Attitudes toward Smoking
Andrews, Judy A.; Hampson, Sarah E.; Greenwald, Anthony G.; Gordon, Judith; Widdop, Chris
2009-01-01
The development and psychometric properties of an Implicit Association Test (IAT) measuring implicit attitude toward smoking among fifth grade children were described. The IAT with “sweets” as the contrast category resulted in higher correlations with explicit attitudes than did the IAT with “healthy foods” as the contrast category. Children with family members who smoked (versus non-smoking) and children who were high in sensation seeking (versus low) had a significantly more favorable implicit attitude toward smoking. Further, implicit attitudes became less favorable after engaging in tobacco prevention activities targeting risk perceptions of addiction. Results support the reliability and validity of this version of the IAT and illustrate its usefulness in assessing young children's implicit attitude toward smoking. PMID:21566676
Power effects on implicit prejudice and stereotyping: The role of intergroup face processing.
Schmid, Petra C; Amodio, David M
2017-04-01
Power is thought to increase discrimination toward subordinate groups, yet its effect on different forms of implicit bias remains unclear. We tested whether power enhances implicit racial stereotyping, in addition to implicit prejudice (i.e., evaluative associations), and examined the effect of power on the automatic processing of faces during implicit tasks. Study 1 showed that manipulated high power increased both forms of implicit bias, relative to low power. Using a neural index of visual face processing (the N170 component of the ERP), Study 2 revealed that power affected the encoding of White ingroup vs. Black outgroup faces. Whereas high power increased the relative processing of outgroup faces during evaluative judgments in the prejudice task, it decreased the relative processing of outgroup faces during stereotype trait judgments. An indirect effect of power on implicit prejudice through enhanced processing of outgroup versus ingroup faces suggested a potential link between face processing and implicit bias. Together, these findings demonstrate that power can affect implicit prejudice and stereotyping as well as early processing of racial ingroup and outgroup faces.
Implicit Attitudes toward the Self Over Time in Chinese Undergraduates
Yang, Qing; Zhao, Yufang; Guan, Lili; Huang, Xiting
2017-01-01
Although the explicit attitudes of Chinese people toward the self over time are known (i.e., past = present < future), little is known about their implicit attitudes. Two studies were conducted to measure the implicit subjective temporal trajectory (STT) of Chinese undergraduates. Study 1 used a Go/No-go association task to measure participants’ implicit attitudes toward their past, present, and future selves. The obtained implicit STT was different from the explicit pattern found in former research. It showed that the future self was viewed to be identical to the present self and participants implicitly evaluated their present self as better than the past self. Since this comparison of the past and present selves suggested a cultural difference, we aimed to replicate this finding in Study 2. Using an implicit association test, we again found that the present self was more easily associated with positive valence than the past self. Overall, both studies reveal an implicitly inclining-flat STT (i.e., past < present = future) for Chinese undergraduates. Implications of this difference in explicit-implicit measures and the cultural differences of temporal self appraisals are discussed. PMID:29163291
Implicit Motives and Men’s Perceived Constraint in Fatherhood
Ruppen, Jessica; Waldvogel, Patricia; Ehlert, Ulrike
2016-01-01
Research shows that implicit motives influence social relationships. However, little is known about their role in fatherhood and, particularly, how men experience their paternal role. Therefore, this study examined the association of implicit motives and fathers’ perceived constraint due to fatherhood. Furthermore, we explored their relation to fathers’ life satisfaction. Participants were fathers with biological children (N = 276). They were asked to write picture stories, which were then coded for implicit affiliation and power motives. Perceived constraint and life satisfaction were assessed on a visual analog scale. A higher implicit need for affiliation was significantly associated with lower perceived constraint, whereas the implicit need for power had the opposite effect. Perceived constraint had a negative influence on life satisfaction. Structural equation modeling revealed significant indirect effects of implicit affiliation and power motives on life satisfaction mediated by perceived constraint. Our findings indicate that men with a higher implicit need for affiliation experience less constraint due to fatherhood, resulting in higher life satisfaction. The implicit need for power, however, results in more perceived constraint and is related to decreased life satisfaction. PMID:27933023
Explicit integration with GPU acceleration for large kinetic networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brock, Benjamin; Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830; Belt, Andrew
2015-12-01
We demonstrate the first implementation of recently-developed fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve of order 100 realistic kinetic networks in parallel in the same time that standard implicit methods can solve a single such network on a CPU. This orders-of-magnitude decrease in computation time for solving systems of realistic kinetic networks implies that important coupled, multiphysics problems inmore » various scientific and technical fields that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible.« less
Investigation of chemically reacting and radiating supersonic internal flows
NASA Technical Reports Server (NTRS)
Mani, M.; Tiwari, S. N.
1986-01-01
The two-dimensional spatially elliptic Navier-Stokes equations are used to investigate the chemically reacting and radiating supersonic flow of the hydrogen-air system between two parallel plates and in a channel with a ten degree compression-expansion ramp at the lower boundary. The explicit unsplit finite-difference technique of MacCormack is used to advance the governing equations in time until convergence is achieved. The chemistry source term in the species equation is treated implicitly to alleviate the stiffness associated with fast reactions. The tangent slab approximation is employed in the radiative flux formation. Both pseudo-gray and nongray models are used to represent the absorption characteristics of the participating species. Results obtained for specific conditions indicate that the radiative interaction can have a significant influence on the flow field.
The 3-D numerical simulation research of vacuum injector for linear induction accelerator
NASA Astrophysics Data System (ADS)
Liu, Dagang; Xie, Mengjun; Tang, Xinbing; Liao, Shuqing
2017-01-01
Simulation method for voltage in-feed and electron injection of vacuum injector is given, and verification of the simulated voltage and current is carried out. The numerical simulation for the magnetic field of solenoid is implemented, and a comparative analysis is conducted between the simulation results and experimental results. A semi-implicit difference algorithm is adopted to suppress the numerical noise, and a parallel acceleration algorithm is used for increasing the computation speed. The RMS emittance calculation method of the beam envelope equations is analyzed. In addition, the simulated results of RMS emittance are compared with the experimental data. Finally, influences of the ferromagnetic rings on the radial and axial magnetic fields of solenoid as well as the emittance of beam are studied.
Multi-partitioning for ADI-schemes on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1994-01-01
A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.
Optical systolic solutions of linear algebraic equations
NASA Technical Reports Server (NTRS)
Neuman, C. P.; Casasent, D.
1984-01-01
The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS
NASA Astrophysics Data System (ADS)
Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.
2018-03-01
Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Health Care Providers’ Implicit and Explicit Attitudes Toward Lesbian Women and Gay Men
Riskind, Rachel G.; Nosek, Brian A.
2015-01-01
Objectives. We examined providers’ implicit and explicit attitudes toward lesbian and gay people by provider gender, sexual identity, and race/ethnicity. Methods. We examined attitudes toward heterosexual people versus lesbian and gay people in Implicit Association Test takers: 2338 medical doctors, 5379 nurses, 8531 mental health providers, 2735 other treatment providers, and 214 110 nonproviders in the United States and internationally between May 2006 and December 2012. We characterized the sample with descriptive statistics and calculated Cohen d, a standardized effect size measure, with 95% confidence intervals. Results. Among heterosexual providers, implicit preferences always favored heterosexual people over lesbian and gay people. Implicit preferences for heterosexual women were weaker than implicit preferences for heterosexual men. Heterosexual nurses held the strongest implicit preference for heterosexual men over gay men (Cohen d = 1.30; 95% confidence interval = 1.28, 1.32 among female nurses; Cohen d = 1.38; 95% confidence interval = 1.32, 1.44 among male nurses). Among all groups, explicit preferences for heterosexual versus lesbian and gay people were weaker than implicit preferences. Conclusions. Implicit preferences for heterosexual people versus lesbian and gay people are pervasive among heterosexual health care providers. Future research should investigate how implicit sexual prejudice affects care. PMID:26180976
Implicit attitudes towards homosexuality: reliability, validity, and controllability of the IAT.
Banse, R; Seise, J; Zerbes, N
2001-01-01
Two experiments were conducted to investigate the psychometric properties of an Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) that was adapted to measure implicit attitudes towards homosexuality. In a first experiment, the validity of the Homosexuality-IAT was tested using a known group approach. Implicit and explicit attitudes were assessed in heterosexual and homosexual men and women (N = 101). The results provided compelling evidence for the convergent and discriminant validity of the Homosexuality-IAT as a measure of implicit attitudes. No evidence was found for two alternative explanations of IAT effects (familiarity with stimulus material and stereotype knowledge). The internal consistency of IAT scores was satisfactory (alpha s > .80), but retest correlations were lower. In a second experiment (N = 79) it was shown that uninformed participants were able to fake positive explicit but not implicit attitudes. Discrepancies between implicit and explicit attitudes towards homosexuality could be partially accounted for by individual differences in the motivation to control prejudiced behavior, thus providing independent evidence for the validity of the implicit attitude measure. Neither explicit nor implicit attitudes could be changed by persuasive messages. The results of both experiments are interpreted as evidence for a single construct account of implicit and explicit attitudes towards homosexuality.
Keng, Shian-Ling; Seah, Stanley T H; Tong, Eddie M W; Smoski, Moria
2016-08-01
Mindfulness-based interventions have been shown to be effective in alleviating depressive symptoms. While much work has examined the effects of mindfulness training on subjective symptoms and experiences, and less is known regarding whether mindfulness training may alter relatively uncontrollable cognitive processes associated with depressed mood, particularly implicit dysfunctional attitudes. The present study examined the effects of a brief mindful acceptance induction on implicit dysfunctional attitudes and degree of concordance between implicit and explicit dysfunctional attitudes in the context of sad mood. A total of 79 adult participants with elevated depressive symptoms underwent an autobiographical mood induction procedure before being randomly assigned to mindful acceptance or thought wandering inductions. Results showed that the effect of mindful acceptance on implicit dysfunctional attitude was significantly moderated by trait mindfulness. Participants high on trait mindfulness demonstrated significant improvements in implicit dysfunctional attitudes following the mindful acceptance induction. Those low on trait mindfulness demonstrated significantly worse implicit dysfunctional attitudes following the induction. Significantly greater levels of concordance between implicit and explicit dysfunctional attitudes were observed in the mindful acceptance condition versus the thought wandering condition. The findings highlight changes in implicit dysfunctional attitudes and improvements in self-concordance as two potential mechanisms underlying the effects of mindfulness-based interventions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Implicit-explicit (IMEX) Runge-Kutta methods for non-hydrostatic atmospheric models
NASA Astrophysics Data System (ADS)
Gardner, David J.; Guerra, Jorge E.; Hamon, François P.; Reynolds, Daniel R.; Ullrich, Paul A.; Woodward, Carol S.
2018-04-01
The efficient simulation of non-hydrostatic atmospheric dynamics requires time integration methods capable of overcoming the explicit stability constraints on time step size arising from acoustic waves. In this work, we investigate various implicit-explicit (IMEX) additive Runge-Kutta (ARK) methods for evolving acoustic waves implicitly to enable larger time step sizes in a global non-hydrostatic atmospheric model. The IMEX formulations considered include horizontally explicit - vertically implicit (HEVI) approaches as well as splittings that treat some horizontal dynamics implicitly. In each case, the impact of solving nonlinear systems in each implicit ARK stage in a linearly implicit fashion is also explored. The accuracy and efficiency of the IMEX splittings, ARK methods, and solver options are evaluated on a gravity wave and baroclinic wave test case. HEVI splittings that treat some vertical dynamics explicitly do not show a benefit in solution quality or run time over the most implicit HEVI formulation. While splittings that implicitly evolve some horizontal dynamics increase the maximum stable step size of a method, the gains are insufficient to overcome the additional cost of solving a globally coupled system. Solving implicit stage systems in a linearly implicit manner limits the solver cost but this is offset by a reduction in step size to achieve the desired accuracy for some methods. Overall, the third-order ARS343 and ARK324 methods performed the best, followed by the second-order ARS232 and ARK232 methods.
Characterizing Implicit Mental Health Associations across Clinical Domains
Werntz, Alexandra J.; Steinman, Shari A.; Glenn, Jeffrey J.; Nock, Matthew K.; Teachman, Bethany A.
2016-01-01
Background and objectives Implicit associations are relatively uncontrollable associations between concepts in memory. The current investigation focuses on implicit associations in four mental health domains (alcohol use, anxiety, depression, and eating disorders) and how these implicit associations: a) relate to explicit associations and b) self-reported clinical symptoms within the same domains, and c) vary based on demographic characteristics (age, gender, race, ethnicity, and education). Methods Participants (volunteers over age 18 to a research website) completed implicit association (Implicit Association Tests), explicit association (self+psychopathology or attitudes toward food, using semantic differential items), and symptom measures at the Project Implicit Mental Health website tied to: alcohol use (N=12,387), anxiety (N=21,304), depression (N=24,126), or eating disorders (N=10,115). Results Within each domain, implicit associations showed small to moderate associations with explicit associations and symptoms, and predicted self-reported symptoms beyond explicit associations. In general, implicit association strength varied little by race and ethnicity, but showed small ties to age, gender, and education. Limitations This research was conducted on a public research and education website, where participants could take more than one of the studies. Conclusions Among a large and diverse sample, implicit associations in the four domains are congruent with explicit associations and self-reported symptoms, and also add to our prediction of self-reported symptoms over and above explicit associations, pointing to the potential future clinical utility and validity of using implicit association measures with diverse populations. PMID:26962979
Nakamura, Mitsuo; Hayakawa, Tomomi; Okamura, Aiko; Kohigashi, Mutsumi; Fukui, Kenji; Narumoto, Jin
2015-01-01
Background If delusions serve as a defense mechanism in schizophrenia patients with paranoia, then they should show normal or high explicit self-esteem and low implicit self-esteem. However, the results of previous studies are inconsistent. One possible explanation for this inconsistency is that there are two types of paranoia, “bad me” (self-blaming) paranoia and “poor me” (non-self-blaming) paranoia. We thus examined implicit and explicit self-esteem and self-blaming tendency in patients with schizophrenia and schizoaffective disorder. We hypothesized that patients with paranoia would show lower implicit self-esteem and only those with non-self-blaming paranoia would experience a discrepancy between explicit and implicit self-esteem. Methods Participants consisted of patients with schizophrenia and schizoaffective disorder recruited from a day hospital (N=71). Participants were assessed for psychotic symptoms, using the Brief Psychiatric Rating Scale (BPRS), and self-blaming tendency, using the brief COPE. We also assessed explicit self-esteem, using the Rosenberg Self-Esteem Scale (RSES), implicit self-esteem, using Brief Implicit Association Test (BIAT), and discrepancy between explicit and implicit self-esteem. Results Contrary to our hypothesis, implicit self-esteem in paranoia and nonparanoia showed no statistical difference. As expected, only patients with non-self-blaming paranoia experienced a discrepancy between explicit and implicit self-esteem; other groups showed no such discrepancy. Conclusion These results suggest that persecutory delusion plays a defensive role in non-self-blaming paranoia. PMID:25565849
Characterizing implicit mental health associations across clinical domains.
Werntz, Alexandra J; Steinman, Shari A; Glenn, Jeffrey J; Nock, Matthew K; Teachman, Bethany A
2016-09-01
Implicit associations are relatively uncontrollable associations between concepts in memory. The current investigation focuses on implicit associations in four mental health domains (alcohol use, anxiety, depression, and eating disorders) and how these implicit associations: a) relate to explicit associations and b) self-reported clinical symptoms within the same domains, and c) vary based on demographic characteristics (age, gender, race, ethnicity, and education). Participants (volunteers over age 18 to a research website) completed implicit association (Implicit Association Tests), explicit association (self + psychopathology or attitudes toward food, using semantic differential items), and symptom measures at the Project Implicit Mental Health website tied to: alcohol use (N = 12,387), anxiety (N = 21,304), depression (N = 24,126), or eating disorders (N = 10,115). Within each domain, implicit associations showed small to moderate associations with explicit associations and symptoms, and predicted self-reported symptoms beyond explicit associations. In general, implicit association strength varied little by race and ethnicity, but showed small ties to age, gender, and education. This research was conducted on a public research and education website, where participants could take more than one of the studies. Among a large and diverse sample, implicit associations in the four domains are congruent with explicit associations and self-reported symptoms, and also add to our prediction of self-reported symptoms over and above explicit associations, pointing to the potential future clinical utility and validity of using implicit association measures with diverse populations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Not explicit but implicit memory is influenced by individual perception style
Tsushima, Yoshiaki
2018-01-01
Not only explicit but also implicit memory has considerable influence on our daily life. However, it is still unclear whether explicit and implicit memories are sensitive to individual differences. Here, we investigated how individual perception style (global or local) correlates with implicit and explicit memory. As a result, we found that not explicit but implicit memory was affected by the perception style: local perception style people more greatly used implicit memory than global perception style people. These results help us to make the new effective application adapting to individual perception style and understand some clinical symptoms such as autistic spectrum disorder. Furthermore, this finding might give us new insight of memory involving consciousness and unconsciousness as well as relationship between implicit/explicit memory and individual perception style. PMID:29370212
Not explicit but implicit memory is influenced by individual perception style.
Hine, Kyoko; Tsushima, Yoshiaki
2018-01-01
Not only explicit but also implicit memory has considerable influence on our daily life. However, it is still unclear whether explicit and implicit memories are sensitive to individual differences. Here, we investigated how individual perception style (global or local) correlates with implicit and explicit memory. As a result, we found that not explicit but implicit memory was affected by the perception style: local perception style people more greatly used implicit memory than global perception style people. These results help us to make the new effective application adapting to individual perception style and understand some clinical symptoms such as autistic spectrum disorder. Furthermore, this finding might give us new insight of memory involving consciousness and unconsciousness as well as relationship between implicit/explicit memory and individual perception style.
Fujii, Tsutomu; Uebuchi, Hisashi; Yamada, Kotono; Saito, Masahiro; Ito, Eriko; Tonegawa, Akiko; Uebuchi, Marie
2015-06-01
The purposes of the present study were (a) to use both a relational-anxiety Go/No-Go Association Task (GNAT) and an avoidance-of-intimacy GNAT in order to assess an implicit Internal Working Model (IWM) of attachment; (b) to verify the effects of both measured implicit relational anxiety and implicit avoidance of intimacy on information processing. The implicit IWM measured by GNAT differed from the explicit IWM measured by questionnaires in terms of the effects on information processing. In particular, in subliminal priming tasks involving with others, implicit avoidance of intimacy predicted accelerated response times with negative stimulus words about attachment. Moreover, after subliminally priming stimulus words about self, implicit relational anxiety predicted delayed response times with negative stimulus words about attachment.
Wegener, Ingo; Geiser, Franziska; Alfter, Susanne; Mierke, Jan; Imbierowicz, Katrin; Kleiman, Alexandra; Koch, Anne Sarah; Conrad, Rupert
2015-04-01
Self-esteem has been claimed to be an important factor in the development and maintenance of depression. Whereas explicit self-esteem is usually reduced in depressed individuals, studies on implicitly measured self-esteem in depression exhibit a more heterogeneous pattern of results, and the role of implicit self-esteem in depression is still ambiguous. Previous research on implicit self-esteem compensation (ISEC) revealed that implicit self-esteem can mirror processes of self-esteem compensation under conditions that threaten self-esteem. We assume that depressed individuals experience a permanent threat to their selves resulting in enduring processes of ISEC. We hypothesize that ISEC as measured by implicit self-esteem will decrease when individuals recover from depression. 45 patients with major depression received an integrative in-patient treatment in the Psychosomatic University Hospital Bonn, Germany. Depression was measured by the depression score of the Hospital Anxiety and Depression Scale (HADS-D). Self-esteem was assessed explicitly using the Rosenberg Self-Esteem Scale (RSES) and implicitly by the Implicit Association Test (IAT) and the Name Letter Test (NLT). As expected for a successful treatment of depression, depression scores declined during the eight weeks of treatment and explicit self-esteem rose. In line with our hypothesis, both measures of implicit self-esteem decreased, indicating reduced processes of ISEC. It still remains unclear, under which conditions there is an overlap of measures of implicit and explicit self-esteem. The results lend support to the concept of ISEC and demonstrate the relevance of implicit self-esteem and self-esteem compensation for the understanding of depression. Copyright © 2014 Elsevier Inc. All rights reserved.
Kantak, Shailesh S; Mummidisetty, Chaithanya K; Stinear, James W
2012-09-01
Implicit and explicit memory systems for motor skills compete with each other during and after motor practice. Primary motor cortex (M1) is known to be engaged during implicit motor learning, while dorsal premotor cortex (PMd) is critical for explicit learning. To elucidate the neural substrates underlying the interaction between implicit and explicit memory systems, adults underwent a randomized crossover experiment of anodal transcranial direct current stimulation (AtDCS) applied over M1, PMd or sham stimulation during implicit motor sequence (serial reaction time task, SRTT) practice. We hypothesized that M1-AtDCS during practice will enhance online performance and offline learning of the implicit motor sequence. In contrast, we also hypothesized that PMd-AtDCS will attenuate performance and retention of the implicit motor sequence. Implicit sequence performance was assessed at baseline, at the end of acquisition (EoA), and 24 h after practice (retention test, RET). M1-AtDCS during practice significantly improved practice performance and supported offline stabilization compared with Sham tDCS. Performance change from EoA to RET revealed that PMd-AtDCS during practice attenuated offline stabilization compared with M1-AtDCS and sham stimulation. The results support the role of M1 in implementing online performance gains and offline stabilization for implicit motor sequence learning. In contrast, enhancing the activity within explicit motor memory network nodes such as the PMd during practice may be detrimental to offline stabilization of the learned implicit motor sequence. These results support the notion of competition between implicit and explicit motor memory systems and identify underlying neural substrates that are engaged in this competition. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
A paper-based microbial fuel cell: instant battery for disposable diagnostic devices.
Fraiwan, Arwa; Mukherjee, Sayantika; Sundermier, Steven; Lee, Hyung-Sool; Choi, Seokheun
2013-11-15
We present a microfabricated paper-based microbial fuel cell (MFC) generating a maximum power of 5.5 μW/cm(2). The MFC features (1) a paper-based proton exchange membrane by infiltrating sulfonated sodium polystyrene sulfonate and (2) micro-fabricated paper chambers by patterning hydrophobic barriers of photoresist. Once inoculum and catholyte were added to the MFC, a current of 74 μA was generated immediately. This paper-based MFC has the advantages of ease of use, low production cost, and high portability. The voltage produced was increased by 1.9 × when two MFC devices were stacked in series, while operating lifetime was significantly enhanced in parallel. Copyright © 2013 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Tony; Sutherland, James C.
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Capacitor-based detection of nuclear magnetization: nuclear quadrupole resonance of surfaces.
Gregorovič, Alan; Apih, Tomaž; Kvasić, Ivan; Lužnik, Janko; Pirnat, Janez; Trontelj, Zvonko; Strle, Drago; Muševič, Igor
2011-03-01
We demonstrate excitation and detection of nuclear magnetization in a nuclear quadrupole resonance (NQR) experiment with a parallel plate capacitor, where the sample is located between the two capacitor plates and not in a coil as usually. While the sensitivity of this capacitor-based detection is found lower compared to an optimal coil-based detection of the same amount of sample, it becomes comparable in the case of very thin samples and even advantageous in the proximity of conducting bodies. This capacitor-based setup may find its application in acquisition of NQR signals from the surface layers on conducting bodies or in a portable tightly integrated nuclear magnetic resonance sensor. Copyright © 2010 Elsevier Inc. All rights reserved.
A survey of program slicing for software engineering
NASA Technical Reports Server (NTRS)
Beck, Jon
1993-01-01
This research concerns program slicing which is used as a tool for program maintainence of software systems. Program slicing decreases the level of effort required to understand and maintain complex software systems. It was first designed as a debugging aid, but it has since been generalized into various tools and extended to include program comprehension, module cohesion estimation, requirements verification, dead code elimination, and maintainence of several software systems, including reverse engineering, parallelization, portability, and reuse component generation. This paper seeks to address and define terminology, theoretical concepts, program representation, different program graphs, developments in static slicing, dynamic slicing, and semantics and mathematical models. Applications for conventional slicing are presented, along with a prognosis of future work in this field.
Saad, Tony; Sutherland, James C.
2016-05-04
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less