implicitly parallel programming: Topics by Science.gov

Sample records for implicitly parallel programming

Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids

DOE PAGES

Xia, Yidong; Luo, Hong; Frisbey, Megan; ...

2014-07-01

A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flowmore » solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.« less
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Halem, Milton (Technical Monitor)

2000-01-01

We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Implicit schemes and parallel computing in unstructured grid CFD

NASA Technical Reports Server (NTRS)

Venkatakrishnam, V.

1995-01-01

The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

ERIC Educational Resources Information Center

Agarwal, Mayank

2009-01-01

The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Parallelizing alternating direction implicit solver on GPUs

USDA-ARS?s Scientific Manuscript database

We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Proceedings of the Conference on Knowledge-Based Software Assistant (5th) Held in Liverpool, New York on 24-28 September 1990

DTIC Science & Technology

1991-03-01

factor which made TTL-design so powerful was the implicit knowledge that for any object in the TTL Databook, that object’s implementation and...functions as values. Thus, its reasoning power matches the descriptive power of the higher order languages in the previous section. First, the definitions...developing parallel algorithms to better utilize the power of the explicitly parallel programming language constructs. Currently, the methodologies
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Parallelization of implicit finite difference schemes in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

1990-01-01

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code.

NASA Astrophysics Data System (ADS)

Chacon, L.; Knoll, D. A.

2004-11-01

We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended primitive-variable MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field, non-dissipative, and stable in the absence of physical dissipation.(L. Chacón , phComput. Phys. Comm.) submitted (2004) PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, first and second-order implicit schemes are available, although higher-order temporal implicit schemes can be effortlessly implemented within the Newton-Krylov framework. A successful, scalable, MG physics-based preconditioning strategy, similar in concept to previous 2D MHD efforts,(L. Chacón et al., phJ. Comput. Phys). 178 (1), 15- 36 (2002); phJ. Comput. Phys., 188 (2), 573-592 (2003) has been developed. We are currently in the process of parallelizing the code using the PETSc library, and a Newton-Krylov-Schwarz approach for the parallel treatment of the preconditioner. In this poster, we will report on both the serial and parallel performance of PIXIE3D, focusing primarily on scalability and CPU speedup vs. an explicit approach.
Semantic Language Extensions for Implicit Parallel Programming

DTIC Science & Technology

2013-09-01

mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
A Comparison of Three Programming Models for Adaptive Applications

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

2000-01-01

We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code

NASA Astrophysics Data System (ADS)

Chacon, Luis

2006-10-01

We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field to machine precision, non-dissipative, and linearly and nonlinearly stable in the absence of physical dissipation. PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, second-order implicit schemes such as Crank-Nicolson and BDF2 (2^nd order backward differentiation formula) are available. PIXIE3D is fully parallel (employs PETSc for parallelism), and exhibits excellent parallel scalability. A parallel, scalable, MG preconditioning strategy, based on physics-based preconditioning ideas, has been developed for resistive MHD, and is currently being extended to Hall MHD. In this poster, we will report on progress in the algorithmic formulation for extended MHD, as well as the the serial and parallel performance of PIXIE3D in a variety of problems and geometries. L. Chac'on, Comput. Phys. Comm., 163 (3), 143-171 (2004) L. Chac'on et al., J. Comput. Phys. 178 (1), 15- 36 (2002); J. Comput. Phys., 188 (2), 573-592 (2003) L. Chac'on, 32nd EPS Conf. Plasma Physics, Tarragona, Spain, 2005 L. Chac'on et al., 33rd EPS Conf. Plasma Physics, Rome, Italy, 2006
Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

NASA Astrophysics Data System (ADS)

Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

2016-06-01

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport

NASA Astrophysics Data System (ADS)

Robinson, P. B.; Peterson, J. D. L.

2005-12-01

The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48
A GPU-accelerated implicit meshless method for compressible flows

NASA Astrophysics Data System (ADS)

Zhang, Jia-Le; Ma, Zhi-Hua; Chen, Hong-Quan; Cao, Cheng

2018-05-01

This paper develops a recently proposed GPU based two-dimensional explicit meshless method (Ma et al., 2014) by devising and implementing an efficient parallel LU-SGS implicit algorithm to further improve the computational efficiency. The capability of the original 2D meshless code is extended to deal with 3D complex compressible flow problems. To resolve the inherent data dependency of the standard LU-SGS method, which causes thread-racing conditions destabilizing numerical computation, a generic rainbow coloring method is presented and applied to organize the computational points into different groups by painting neighboring points with different colors. The original LU-SGS method is modified and parallelized accordingly to perform calculations in a color-by-color manner. The CUDA Fortran programming model is employed to develop the key kernel functions to apply boundary conditions, calculate time steps, evaluate residuals as well as advance and update the solution in the temporal space. A series of two- and three-dimensional test cases including compressible flows over single- and multi-element airfoils and a M6 wing are carried out to verify the developed code. The obtained solutions agree well with experimental data and other computational results reported in the literature. Detailed analysis on the performance of the developed code reveals that the developed CPU based implicit meshless method is at least four to eight times faster than its explicit counterpart. The computational efficiency of the implicit method could be further improved by ten to fifteen times on the GPU.

Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Flexible language constructs for large parallel programs

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Schnabel, Robert

1993-01-01

The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
The science of computing - The evolution of parallel processing

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
HPF Implementation of ARC3D

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Yan, Jerry

1999-01-01

We present an HPF (High Performance Fortran) implementation of ARC3D code along with the profiling and performance data on SGI Origin 2000. Advantages and limitations of HPF as a parallel programming language for CFD applications are discussed. For achieving good performance results we used the data distributions optimized for implementation of implicit and explicit operators of the solver and boundary conditions. We compare the results with MPI and directive based implementations.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

NASA Technical Reports Server (NTRS)

Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

1992-01-01

A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
Flexible Language Constructs for Large Parallel Programs

DOE PAGES

Rosing, Matt; Schnabel, Robert

1994-01-01

The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
A Second-Order Implicit Knowledge: Its Implications for E-Learning

ERIC Educational Resources Information Center

Noaparast, Khosrow Bagheri

2014-01-01

The dichotomous epistemology of explicit/implicit knowledge has led to two parallel lines of research; one putting the emphasis on explicit knowledge which has been the main road of e-learning, and the other taking implicit knowledge as the core of learning which has shaped a critical line to the current e-learning. It is argued in this article…
Explicit pre-training instruction does not improve implicit perceptual-motor sequence learning

PubMed Central

Sanchez, Daniel J.; Reber, Paul J.

2012-01-01

Memory systems theory argues for separate neural systems supporting implicit and explicit memory in the human brain. Neuropsychological studies support this dissociation, but empirical studies of cognitively healthy participants generally observe that both kinds of memory are acquired to at least some extent, even in implicit learning tasks. A key question is whether this observation reflects parallel intact memory systems or an integrated representation of memory in healthy participants. Learning of complex tasks in which both explicit instruction and practice is used depends on both kinds of memory, and how these systems interact will be an important component of the learning process. Theories that posit an integrated, or single, memory system for both types of memory predict that explicit instruction should contribute directly to strengthening task knowledge. In contrast, if the two types of memory are independent and acquired in parallel, explicit knowledge should have no direct impact and may serve in a “scaffolding” role in complex learning. Using an implicit perceptual-motor sequence learning task, the effect of explicit pre-training instruction on skill learning and performance was assessed. Explicit pre-training instruction led to robust explicit knowledge, but sequence learning did not benefit from the contribution of pre-training sequence memorization. The lack of an instruction benefit suggests that during skill learning, implicit and explicit memory operate independently. While healthy participants will generally accrue parallel implicit and explicit knowledge in complex tasks, these types of information appear to be separately represented in the human brain consistent with multiple memory systems theory. PMID:23280147
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Parallel and Portable Monte Carlo Particle Transport

NASA Astrophysics Data System (ADS)

Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.

1997-08-01

We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions

NASA Astrophysics Data System (ADS)

Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey

2014-01-01

The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Implicit Coupling Approach for Simulation of Charring Carbon Ablators

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq; Gokcen, Tahir

2013-01-01

This study demonstrates that coupling of a material thermal response code and a flow solver with nonequilibrium gas/surface interaction for simulation of charring carbon ablators can be performed using an implicit approach. The material thermal response code used in this study is the three-dimensional version of Fully Implicit Ablation and Thermal response program, which predicts charring material thermal response and shape change on hypersonic space vehicles. The flow code solves the reacting Navier-Stokes equations using Data Parallel Line Relaxation method. Coupling between the material response and flow codes is performed by solving the surface mass balance in flow solver and the surface energy balance in material response code. Thus, the material surface recession is predicted in flow code, and the surface temperature and pyrolysis gas injection rate are computed in material response code. It is demonstrated that the time-lagged explicit approach is sufficient for simulations at low surface heating conditions, in which the surface ablation rate is not a strong function of the surface temperature. At elevated surface heating conditions, the implicit approach has to be taken, because the carbon ablation rate becomes a stiff function of the surface temperature, and thus the explicit approach appears to be inappropriate resulting in severe numerical oscillations of predicted surface temperature. Implicit coupling for simulation of arc-jet models is performed, and the predictions are compared with measured data. Implicit coupling for trajectory based simulation of Stardust fore-body heat shield is also conducted. The predicted stagnation point total recession is compared with that predicted using the chemical equilibrium surface assumption
Convergence issues in domain decomposition parallel computation of hovering rotor

NASA Astrophysics Data System (ADS)

Xiao, Zhongyun; Liu, Gang; Mou, Bin; Jiang, Xiong

2018-05-01

Implicit LU-SGS time integration algorithm has been widely used in parallel computation in spite of its lack of information from adjacent domains. When applied to parallel computation of hovering rotor flows in a rotating frame, it brings about convergence issues. To remedy the problem, three LU factorization-based implicit schemes (consisting of LU-SGS, DP-LUR and HLU-SGS) are investigated comparatively. A test case of pure grid rotation is designed to verify these algorithms, which show that LU-SGS algorithm introduces errors on boundary cells. When partition boundaries are circumferential, errors arise in proportion to grid speed, accumulating along with the rotation, and leading to computational failure in the end. Meanwhile, DP-LUR and HLU-SGS methods show good convergence owing to boundary treatment which are desirable in domain decomposition parallel computations.
Expressing Parallelism with ROOT

NASA Astrophysics Data System (ADS)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piparo, D.; Tejedor, E.; Guiraud, E.

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.

Parallel Implicit Runge-Kutta Methods Applied to Coupled Orbit/Attitude Propagation

NASA Astrophysics Data System (ADS)

Hatten, Noble; Russell, Ryan P.

2017-12-01

A variable-step Gauss-Legendre implicit Runge-Kutta (GLIRK) propagator is applied to coupled orbit/attitude propagation. Concepts previously shown to improve efficiency in 3DOF propagation are modified and extended to the 6DOF problem, including the use of variable-fidelity dynamics models. The impact of computing the stage dynamics of a single step in parallel is examined using up to 23 threads and 22 associated GLIRK stages; one thread is reserved for an extra dynamics function evaluation used in the estimation of the local truncation error. Efficiency is found to peak for typical examples when using approximately 8 to 12 stages for both serial and parallel implementations. Accuracy and efficiency compare favorably to explicit Runge-Kutta and linear-multistep solvers for representative scenarios. However, linear-multistep methods are found to be more efficient for some applications, particularly in a serial computing environment, or when parallelism can be applied across multiple trajectories.
Implicit solution of Navier-Stokes equations on staggered curvilinear grids using a Newton-Krylov method with a novel analytical Jacobian.

NASA Astrophysics Data System (ADS)

Borazjani, Iman; Asgharzadeh, Hafez

2015-11-01

Flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates with explicit and semi-implicit schemes. Implicit schemes can be used to overcome these restrictions. However, implementing implicit solver for nonlinear equations including Navier-Stokes is not straightforward. Newton-Krylov subspace methods (NKMs) are one of the most advanced iterative methods to solve non-linear equations such as implicit descritization of the Navier-Stokes equation. The efficiency of NKMs massively depends on the Jacobian formation method, e.g., automatic differentiation is very expensive, and matrix-free methods slow down as the mesh is refined. Analytical Jacobian is inexpensive method, but derivation of analytical Jacobian for Navier-Stokes equation on staggered grid is challenging. The NKM with a novel analytical Jacobian was developed and validated against Taylor-Green vortex and pulsatile flow in a 90 degree bend. The developed method successfully handled the complex geometries such as an intracranial aneurysm with multiple overset grids, and immersed boundaries. It is shown that the NKM with an analytical Jacobian is 3 to 25 times faster than the fixed-point implicit Runge-Kutta method, and more than 100 times faster than automatic differentiation depending on the grid (size) and the flow problem. The developed methods are fully parallelized with parallel efficiency of 80-90% on the problems tested.
The Relationship of Explicit-Implicit Evaluative Discrepancy to Exercise Dropout in Middle-Aged Adults.

PubMed

Berry, Tanya R; Rodgers, Wendy M; Divine, Alison; Hall, Craig

2018-06-19

Discrepancies between automatically activated associations (i.e., implicit evaluations) and explicit evaluations of motives (measured with a questionnaire) could lead to greater information processing to resolve discrepancies or self-regulatory failures that may affect behavior. This research examined the relationship of health and appearance exercise-related explicit-implicit evaluative discrepancies, the interaction between implicit and explicit evaluations, and the combined value of explicit and implicit evaluations (i.e., the summed scores) to dropout from a yearlong exercise program. Participants (N = 253) completed implicit health and appearance measures and explicit health and appearance motives at baseline, prior to starting the exercise program. The sum of implicit and explicit appearance measures was positively related to weeks in the program, and discrepancy between the implicit and explicit health measures was negatively related to length of time in the program. Implicit exercise evaluations and their relationships to oft-cited motives such as appearance and health may inform exercise dropout.
Adaptive implicit-explicit and parallel element-by-element iteration schemes

NASA Technical Reports Server (NTRS)

Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.

1989-01-01

Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
Parallel computing techniques for rotorcraft aerodynamics

NASA Astrophysics Data System (ADS)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Radiation-MHD Simulations of Pillars and Globules in HII Regions

NASA Astrophysics Data System (ADS)

Mackey, J.

2012-07-01

Implicit and explicit raytracing-photoionisation algorithms have been implemented in the author's radiation-magnetohydrodynamics code. The algorithms are described briefly and their efficiency and parallel scaling are investigated. The implicit algorithm is more efficient for calculations where ionisation fronts have very supersonic velocities, and the explicit algorithm is favoured in the opposite limit because of its better parallel scaling. The implicit method is used to investigate the effects of initially uniform magnetic fields on the formation and evolution of dense pillars and cometary globules at the boundaries of HII regions. It is shown that for weak and medium field strengths an initially perpendicular field is swept into alignment with the pillar during its dynamical evolution, matching magnetic field observations of the ‘Pillars of Creation’ in M16. A strong perpendicular magnetic field remains in its initial configuration and also confines the photoevaporation flow into a bar-shaped, dense, ionised ribbon which partially shields the ionisation front.
Thermal Ablation Modeling for Silicate Materials

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq

2016-01-01

A general thermal ablation model for silicates is proposed. The model includes the mass losses through the balance between evaporation and condensation, and through the moving molten layer driven by surface shear force and pressure gradient. This model can be applied in the ablation simulation of the meteoroid and the glassy ablator for spacecraft Thermal Protection Systems. Time-dependent axisymmetric computations are performed by coupling the fluid dynamics code, Data-Parallel Line Relaxation program, with the material response code, Two-dimensional Implicit Thermal Ablation simulation program, to predict the mass lost rates and shape change. The predicted mass loss rates will be compared with available data for model validation, and parametric studies will also be performed for meteoroid earth entry conditions.
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

NASA Technical Reports Server (NTRS)

Gropp, W. D.; Keyes, D. E.; McInnes, L. C.; Tidriri, M. D.

1998-01-01

Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, "routine" parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (Psi-NKS) algorithmic framework is presented as an answer. We show that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, Psi-NKS can simultaneously deliver: globalized, asymptotically rapid convergence through adaptive pseudo- transient continuation and Newton's method-, reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and high per- processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of Psi-NKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of Psi-NKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here.
Parallelized implicit propagators for the finite-difference Schrödinger equation

NASA Astrophysics Data System (ADS)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
White and Black American Children’s Implicit Intergroup Bias

PubMed Central

Newheiser, Anna-Kaisa; Olson, Kristina R.

2011-01-01

Despite a decline in explicit prejudice, adults and children from majority groups (e.g., White Americans) often express bias implicitly, as assessed by the Implicit Association Test. In contrast, minority-group (e.g., Black American) adults on average show no bias on the IAT. In the present research, representing the first empirical investigation of whether Black children’s IAT responses parallel those of Black adults, we examined implicit bias in 7–11-year-old White and Black American children. Replicating previous findings with adults, whereas White children showed a robust ingroup bias, Black children showed no bias. Additionally, we investigated the role of valuing status in the development of implicit bias. For Black children, explicit preference for high status predicted implicit outgroup bias: Black children who explicitly expressed high preference for rich (vs. poor) people showed an implicit preference for Whites comparable in magnitude to White children’s ingroup bias. Implications for research on intergroup bias are discussed. PMID:22184478
Development of a fully implicit particle-in-cell scheme for gyrokinetic electromagnetic turbulence simulation in XGC1

NASA Astrophysics Data System (ADS)

Ku, Seung-Hoe; Hager, R.; Chang, C. S.; Chacon, L.; Chen, G.; EPSI Team

2016-10-01

The cancelation problem has been a long-standing issue for long wavelengths modes in electromagnetic gyrokinetic PIC simulations in toroidal geometry. As an attempt of resolving this issue, we implemented a fully implicit time integration scheme in the full-f, gyrokinetic PIC code XGC1. The new scheme - based on the implicit Vlasov-Darwin PIC algorithm by G. Chen and L. Chacon - can potentially resolve cancelation problem. The time advance for the field and the particle equations is space-time-centered, with particle sub-cycling. The resulting system of equations is solved by a Picard iteration solver with fixed-point accelerator. The algorithm is implemented in the parallel velocity formalism instead of the canonical parallel momentum formalism. XGC1 specializes in simulating the tokamak edge plasma with magnetic separatrix geometry. A fully implicit scheme could be a way to accurate and efficient gyrokinetic simulations. We will test if this numerical scheme overcomes the cancelation problem, and reproduces the dispersion relation of Alfven waves and tearing modes in cylindrical geometry. Funded by US DOE FES and ASCR, and computing resources provided by OLCF through ALCC.
A Parallel Implicit Reconstructed Discontinuous Galerkin Method for Compressible Flows on Hybrid Grids

NASA Astrophysics Data System (ADS)

Xia, Yidong

The objective this work is to develop a parallel, implicit reconstructed discontinuous Galerkin (RDG) method using Taylor basis for the solution of the compressible Navier-Stokes equations on 3D hybrid grids. This third-order accurate RDG method is based on a hierarchical weighed essentially non- oscillatory reconstruction scheme, termed as HWENO(P1P 2) to indicate that a quadratic polynomial solution is obtained from the underlying linear polynomial DG solution via a hierarchical WENO reconstruction. The HWENO(P1P2) is designed not only to enhance the accuracy of the underlying DG(P1) method but also to ensure non-linear stability of the RDG method. In this reconstruction scheme, a quadratic polynomial (P2) solution is first reconstructed using a least-squares approach from the underlying linear (P1) discontinuous Galerkin solution. The final quadratic solution is then obtained using a Hermite WENO reconstruction, which is necessary to ensure the linear stability of the RDG method on 3D unstructured grids. The first derivatives of the quadratic polynomial solution are then reconstructed using a WENO reconstruction in order to eliminate spurious oscillations in the vicinity of strong discontinuities, thus ensuring the non-linear stability of the RDG method. The parallelization in the RDG method is based on a message passing interface (MPI) programming paradigm, where the METIS library is used for the partitioning of a mesh into subdomain meshes of approximately the same size. Both multi-stage explicit Runge-Kutta and simple implicit backward Euler methods are implemented for time advancement in the RDG method. In the implicit method, three approaches: analytical differentiation, divided differencing (DD), and automatic differentiation (AD) are developed and implemented to obtain the resulting flux Jacobian matrices. The automatic differentiation is a set of techniques based on the mechanical application of the chain rule to obtain derivatives of a function given as a computer program. By using an AD tool, the manpower can be significantly reduced for deriving the flux Jacobians, which can be quite complicated, tedious, and error-prone if done by hand or symbolic arithmetic software, depending on the complexity of the numerical flux scheme. In addition, the workload for code maintenance can also be largely reduced in case the underlying flux scheme is updated. The approximate system of linear equations arising from the Newton linearization is solved by the general minimum residual (GMRES) algorithm with lower-upper symmetric gauss-seidel (LUSGS) preconditioning. This GMRES+LU-SGS linear solver is the most robust and efficient for implicit time integration of the discretized Navier-Stokes equations when the AD-based flux Jacobians are provided other than the other two approaches. The developed HWENO(P1P2) method is used to compute a variety of well-documented compressible inviscid and viscous flow test cases on 3D hybrid grids, including some standard benchmark test cases such as the Sod shock tube, flow past a circular cylinder, and laminar flow past a at plate. The computed solutions are compared with either analytical solutions or experimental data, if available to assess the accuracy of the HWENO(P 1P2) method. Numerical results demonstrate that the HWENO(P 1P2) method is able to not only enhance the accuracy of the underlying HWENO(P1) method, but also ensure the linear and non-linear stability at the presence of strong discontinuities. An extensive study of grid convergence analysis on various types of elements: tetrahedron, prism, hexahedron, and hybrid prism/hexahedron, for a number of test cases indicates that the developed HWENO(P1P2) method is able to achieve the designed third-order accuracy of spatial convergence for smooth inviscid flows: one order higher than the underlying second-order DG(P1) method without significant increase in computing costs and storage requirements. The performance of the the developed GMRES+LU-SGS implicit method is compared with the multi-stage Runge-Kutta time stepping scheme for a number of test cases in terms of the timestep and CPU time. Numerical results indicate that the overall performance of the implicit method with AD-based Jacobians is order of magnitude better than the its explicit counterpart. Finally, a set of parallel scaling tests for both explicit and implicit methods is conducted on North Carolina State University's ARC cluster, demonstrating almost an ideal scalability of the RDG method. (Abstract shortened by UMI.)
The novel implicit LU-SGS parallel iterative method based on the diffusion equation of a nuclear reactor on a GPU cluster

NASA Astrophysics Data System (ADS)

Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya

2017-02-01

GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
A parallel domain decomposition-based implicit method for the Cahn–Hilliard–Cook phase-field equation in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190

2015-03-15

We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
Motivation modulates the effect of approach on implicit preferences.

PubMed

Zogmaister, Cristina; Perugini, Marco; Richetin, Juliette

2016-08-01

With three studies, we investigated whether motivational states can modulate the formation of implicit preferences. In Study 1, participants played a video game in which they repeatedly approached one of two similar beverages, while disregarding the other. A subsequent implicit preference for the target beverage emerged, which increased with participants' thirst. In Study 2, participants approached one brand of potato chips while avoiding the other: Conceptually replicating the moderation observed in Study 1, the implicit preference for the approached brand increased with the number of hours from last food intake. In Study 3, we experimentally manipulated hunger, and the moderation effect emerged again, with hungry participants displaying a higher implicit preference for the approached brand, as compared to satiated participants. In the three studies, the moderation effect was not paralleled in explicit preferences although the latter were affected by the preference inducing manipulation. Theoretical implications and open questions are discussed.
Explicit and Implicit Approach Motivation Interact to Predict Interpersonal Arrogance

PubMed Central

Robinson, Michael D.; Ode, Scott; Spencer L., Palder; Fetterman, Adam K.

2012-01-01

Self-reports of approach motivation are unlikely to be sufficient in understanding the extent to which the individual reacts to appetitive cues in an approach-related manner. A novel implicit probe of approach tendencies was thus developed, one that assessed the extent to which positive affective (versus neutral) stimuli primed larger size estimates, as larger perceptual sizes co-occur with locomotion toward objects in the environment. In two studies (total N = 150), self-reports of approach motivation interacted with this implicit probe of approach motivation to predict individual differences in arrogance, a broad interpersonal dimension previously linked to narcissism, antisocial personality tendencies, and aggression. The results of the two studies were highly parallel in that self-reported levels of approach motivation predicted interpersonal arrogance in the particular context of high, but not low, levels of implicit approach motivation. Implications for understanding approach motivation, implicit probes of it, and problematic approach-related outcomes are discussed. PMID:22399360
Explicit and implicit approach motivation interact to predict interpersonal arrogance.

PubMed

Robinson, Michael D; Ode, Scott; Palder, Spencer L; Fetterman, Adam K

2012-07-01

Self-reports of approach motivation are unlikely to be sufficient in understanding the extent to which the individual reacts to appetitive cues in an approach-related manner. A novel implicit probe of approach tendencies was thus developed, one that assessed the extent to which positive affective (versus neutral) stimuli primed larger size estimates, as larger perceptual sizes co-occur with locomotion toward objects in the environment. In two studies (total N = 150), self-reports of approach motivation interacted with this implicit probe of approach motivation to predict individual differences in arrogance, a broad interpersonal dimension previously linked to narcissism, antisocial personality tendencies, and aggression. The results of the two studies were highly parallel in that self-reported levels of approach motivation predicted interpersonal arrogance in the particular context of high, but not low, levels of implicit approach motivation. Implications for understanding approach motivation, implicit probes of it, and problematic approach-related outcomes are discussed.
Using behavior-analytic implicit tests to assess sexual interests among normal and sex-offender populations

PubMed Central

Roche, Bryan; O’Reilly, Anthony; Gavin, Amanda; Ruiz, Maria R.; Arancibia, Gabriela

2012-01-01

Background The development of implicit tests for measuring biases and behavioral predispositions is a recent development within psychology. While such tests are usually researched within a social-cognitive paradigm, behavioral researchers have also begun to view these tests as potential tests of conditioning histories, including in the sexual domain. Objective The objective of this paper is to illustrate the utility of a behavioral approach to implicit testing and means by which implicit tests can be built to the standards of behavioral psychologists. Design Research findings illustrating the short history of implicit testing within the experimental analysis of behavior are reviewed. Relevant parallel and overlapping research findings from the field of social cognition and on the Implicit Association Test are also outlined. Results New preliminary data obtained with both normal and sex offender populations are described in order to illustrate how behavior-analytically conceived implicit tests may have potential as investigative tools for assessing histories of sexual arousal conditioning and derived stimulus associations. Conclusion It is concluded that popular implicit tests are likely sensitive to conditioned and derived stimulus associations in the history of the test-taker rather than ‘unconscious cognitions’, per se. PMID:24693346
Implicit identification with drug and alcohol use predicts retention in residential rehabilitation programs.

PubMed

Wolff, Nathan; von Hippel, Courtney; Brener, Loren; von Hippel, William

2015-03-01

Research has identified numerous factors associated with successful treatment in alcohol and drug rehabilitation programs, yet treatment completion rates are often low and subsequent relapse rates very high. We propose that people's implicit identification with drugs and alcohol may be an additional factor that impacts their ability to complete abstinence-based rehabilitation programs. In the current research, we measured implicit identification with drugs and alcohol using the Implicit Association Test (Greenwald, McGhee, & Schwartz, 1998) among 137 members of a residential rehabilitation program for drugs and alcohol (104 men; mean age = 35 years old, 47 of whom were court-ordered to attend). Implicit identification with drugs and alcohol was measured within 1 week of arrival and again 3 weeks later, prior to the onset of the treatment phase of the program. Duration in rehabilitation was assessed 1 year later. Consistent with predictions, implicit identification with drugs and alcohol predicted the duration that people remained in residential rehabilitation even though a self-report measure of identification with drugs and alcohol did not. These results suggest that implicit identification with drugs and alcohol might be an important predictor of treatment outcomes, even among those with serious problems with drug and alcohol use. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Development and Verification of the Charring Ablating Thermal Protection Implicit System Solver

NASA Technical Reports Server (NTRS)

Amar, Adam J.; Calvert, Nathan D.; Kirk, Benjamin S.

2010-01-01

The development and verification of the Charring Ablating Thermal Protection Implicit System Solver is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method with first and second order implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton's method, while the fully implicit linear system is solved with the Generalized Minimal Residual method. Verification results from exact solutions and the Method of Manufactured Solutions are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.

A Theory Upon Origin of Implicit Musical Language.

PubMed

Vas József, P

2015-11-30

The author suggests that the origin of musicality is implied in an implicit musical language every human being possesses in uterus due to a resonance and attunement with prenatal environment, mainly the mother. It is emphasized that ego-development and evolving implicit musical language can be regarded as parallel processes. To support this idea a lot of examples of musical representations are demonstrated by the author. Music is viewed as a tone of ego-functioning involving the musical representations of bodily and visceral senses, cross-modal perception, unity of sense of self, individual fate of ego, and tripolar and bipolar musical coping codes. Finally, a special form of music therapy is shown to illustrate how can implicit musical language be transformed into explicit language by virtue of participants' spontaneity, creativity, and playfulness.
A Theory Upon Origin of Implicit Musical Language

PubMed Central

Vas József, P.

2015-01-01

The author suggests that the origin of musicality is implied in an implicit musical language every human being possesses in uterus due to a resonance and attunement with prenatal environment, mainly the mother. It is emphasized that ego-development and evolving implicit musical language can be regarded as parallel processes. To support this idea a lot of examples of musical representations are demonstrated by the author. Music is viewed as a tone of ego-functioning involving the musical representations of bodily and visceral senses, cross-modal perception, unity of sense of self, individual fate of ego, and tripolar and bipolar musical coping codes. Finally, a special form of music therapy is shown to illustrate how can implicit musical language be transformed into explicit language by virtue of participants’ spontaneity, creativity, and playfulness. PMID:26973966
Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

NASA Astrophysics Data System (ADS)

Xing, F.; Masson, R.; Lopez, S.

2017-09-01

This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
A Data Parallel Multizone Navier-Stokes Code

NASA Technical Reports Server (NTRS)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1995-01-01

We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.
Propranolol reduces implicit negative racial bias.

PubMed

Terbeck, Sylvia; Kahane, Guy; McTavish, Sarah; Savulescu, Julian; Cowen, Philip J; Hewstone, Miles

2012-08-01

Implicit negative attitudes towards other races are important in certain kinds of prejudicial social behaviour. Emotional mechanisms are thought to be involved in mediating implicit "outgroup" bias but there is little evidence concerning the underlying neurobiology. The aim of the present study was to examine the role of noradrenergic mechanisms in the generation of implicit racial attitudes. Healthy volunteers (n = 36) of white ethnic origin, received a single oral dose of the β-adrenoceptor antagonist, propranolol (40 mg), in a randomised, double-blind, parallel group, placebo-controlled, design. Participants completed an explicit measure of prejudice and the racial implicit association test (IAT), 1-2 h after propranolol administration. Relative to placebo, propranolol significantly lowered heart rate and abolished implicit racial bias, without affecting the measure of explicit racial prejudice. Propranolol did not affect subjective mood. Our results indicate that β-adrenoceptors play a role in the expression of implicit racial attitudes suggesting that noradrenaline-related emotional mechanisms may mediate negative racial bias. Our findings may also have practical importance given that propranolol is a widely used drug. However, further studies will be needed to examine whether a similar effect can be demonstrated in the course of clinical treatment.
Are we puppets on a string? Comparing the impact of contingency and validity on implicit and explicit evaluations.

PubMed

Peters, Kurt R; Gawronski, Bertram

2011-04-01

Research has demonstrated that implicit and explicit evaluations of the same object can diverge. Explanations of such dissociations frequently appeal to dual-process theories, such that implicit evaluations are assumed to reflect object-valence contingencies independent of their perceived validity, whereas explicit evaluations reflect the perceived validity of object-valence contingencies. Although there is evidence supporting these assumptions, it remains unclear if dissociations can arise in situations in which object-valence contingencies are judged to be true or false during the learning of these contingencies. Challenging dual-process accounts that propose a simultaneous operation of two parallel learning mechanisms, results from three experiments showed that the perceived validity of evaluative information about social targets qualified both explicit and implicit evaluations when validity information was available immediately after the encoding of the valence information; however, delaying the presentation of validity information reduced its qualifying impact for implicit, but not explicit, evaluations.
Measuring automatic retrieval: a comparison of implicit memory, process dissociation, and speeded response procedures.

PubMed

Horton, Keith D; Wilson, Daryl E; Vonk, Jennifer; Kirby, Sarah L; Nielsen, Tina

2005-07-01

Using the stem completion task, we compared estimates of automatic retrieval from an implicit memory task, the process dissociation procedure, and the speeded response procedure. Two standard manipulations were employed. In Experiment 1, a depth of processing effect was found on automatic retrieval using the speeded response procedure although this effect was substantially reduced in Experiment 2 when lexical processing was required of all words. In Experiment 3, the speeded response procedure showed an advantage of full versus divided attention at study on automatic retrieval. An implicit condition showed parallel effects in each study, suggesting that implicit stem completion may normally provide a good estimate of automatic retrieval. Also, we replicated earlier findings from the process dissociation procedure, but estimates of automatic retrieval from this procedure were consistently lower than those from the speeded response procedure, except when conscious retrieval was relatively low. We discuss several factors that may contribute to the conflicting outcomes, including the evidence for theoretical assumptions and criterial task differences between implicit and explicit tests.
Newton-like methods for Navier-Stokes solution

NASA Astrophysics Data System (ADS)

Qin, N.; Xu, X.; Richards, B. E.

1992-12-01

The paper reports on Newton-like methods called SFDN-alpha-GMRES and SQN-alpha-GMRES methods that have been devised and proven as powerful schemes for large nonlinear problems typical of viscous compressible Navier-Stokes solutions. They can be applied using a partially converged solution from a conventional explicit or approximate implicit method. Developments have included the efficient parallelization of the schemes on a distributed memory parallel computer. The methods are illustrated using a RISC workstation and a transputer parallel system respectively to solve a hypersonic vortical flow.
Operationalizing the Implicit Curriculum in MSW Distance Education Programs

ERIC Educational Resources Information Center

Quinn, Andrew; Barth, Anna M.

2014-01-01

Sixteen MSW distance programs provided insight into how the implicit curriculum currently exists within their programs. Overall, distance programs carried out the activities necessary for student development; the student population made for a more diverse learning community; and faculty were receiving training. There was still a heavy reliance on…
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

NASA Astrophysics Data System (ADS)

Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

2017-11-01

The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package

NASA Astrophysics Data System (ADS)

Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin

2015-01-01

A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
CHARMM: The Biomolecular Simulation Program

PubMed Central

Brooks, B.R.; Brooks, C.L.; MacKerell, A.D.; Nilsson, L.; Petrella, R.J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A.R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R.W.; Post, C.B.; Pu, J.Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D.M.; Karplus, M.

2009-01-01

CHARMM (Chemistry at HARvard Molecular Mechanics) is a highly versatile and widely used molecular simulation program. It has been developed over the last three decades with a primary focus on molecules of biological interest, including proteins, peptides, lipids, nucleic acids, carbohydrates and small molecule ligands, as they occur in solution, crystals, and membrane environments. For the study of such systems, the program provides a large suite of computational tools that include numerous conformational and path sampling methods, free energy estimators, molecular minimization, dynamics, and analysis techniques, and model-building capabilities. In addition, the CHARMM program is applicable to problems involving a much broader class of many-particle systems. Calculations with CHARMM can be performed using a number of different energy functions and models, from mixed quantum mechanical-molecular mechanical force fields, to all-atom classical potential energy functions with explicit solvent and various boundary conditions, to implicit solvent and membrane models. The program has been ported to numerous platforms in both serial and parallel architectures. This paper provides an overview of the program as it exists today with an emphasis on developments since the publication of the original CHARMM paper in 1983. PMID:19444816
Religion insulates ingroup evaluations: the development of intergroup attitudes in India.

PubMed

Dunham, Yarrow; Srinivasan, Mahesh; Dotsch, Ron; Barner, David

2014-03-01

Research on the development of implicit intergroup attitudes has placed heavy emphasis on race, leaving open how social categories that are prominent in other cultures might operate. We investigate two of India's primary means of social distinction, caste and religion, and explore the development of implicit and explicit attitudes towards these groups in minority-status Muslim children and majority-status Hindu children, the latter drawn from various positions in the Hindu caste system. Results from two tests of implicit attitudes find that caste attitudes parallel previous findings for race: higher-caste children as well as lower-caste children have robust high-caste preferences. However, results for religion were strikingly different: both lower-status Muslim children and higher-status Hindu children show strong implicit ingroup preferences. We suggest that religion may play a protective role in insulating children from the internalization of stigma. © 2013 John Wiley & Sons Ltd.
Implicit and explicit processing in deep dyslexia: Semantic blocking as a test for failure of inhibition in the phonological output lexicon.

PubMed

Colangelo, Annette; Buchanan, Lori

2006-12-01

The failure of inhibition hypothesis posits a theoretical distinction between implicit and explicit access in deep dyslexia. Specifically, the effects of failure of inhibition are assumed only in conditions that have an explicit selection requirement in the context of production (i.e., aloud reading). In contrast, the failure of inhibition hypothesis proposes that implicit processing and explicit access to semantic information without production demands are intact in deep dyslexia. Evidence for intact implicit and explicit access requires that performance in deep dyslexia parallels that observed in neurologically intact participants on tasks based on implicit and explicit processes. In other words, deep dyslexics should produce normal effects in conditions with implicit task demands (i.e., lexical decision) and on tasks based on explicit access without production (i.e., forced choice semantic decisions) because failure of inhibition does not impact the availability of lexical information, only explicit retrieval in the context of production. This research examined the distinction between implicit and explicit processes in deep dyslexia using semantic blocking in lexical decision and forced choice semantic decisions as a test for the failure of inhibition hypothesis. The results of the semantic blocking paradigm support the distinction between implicit and explicit processing and provide evidence for failure of inhibition as an explanation for semantic errors in deep dyslexia.
Explicit and implicit learning: The case of computer programming

NASA Astrophysics Data System (ADS)

Mancy, Rebecca

The central question of this thesis concerns the role of explicit and implicit learning in the acquisition of a complex skill, namely computer programming. This issue is explored with reference to information processing models of memory drawn from cognitive science. These models indicate that conscious information processing occurs in working memory where information is stored and manipulated online, but that this mode of processing shows serious limitations in terms of capacity or resources. Some information processing models also indicate information processing in the absence of conscious awareness through automation and implicit learning. It was hypothesised that students would demonstrate implicit and explicit knowledge and that both would contribute to their performance in programming. This hypothesis was investigated via two empirical studies. The first concentrated on temporary storage and online processing in working memory and the second on implicit and explicit knowledge. Storage and processing were tested using two tools: temporary storage capacity was measured using a digit span test; processing was investigated with a disembedding test. The results were used to calculate correlation coefficients with performance on programming examinations. Individual differences in temporary storage had only a small role in predicting programming performance and this factor was not a major determinant of success. Individual differences in disembedding were more strongly related to programming achievement. The second study used interviews to investigate the use of implicit and explicit knowledge. Data were analysed according to a grounded theory paradigm. The results indicated that students possessed implicit and explicit knowledge, but that the balance between the two varied between students and that the most successful students did not necessarily possess greater explicit knowledge. The ways in which students described their knowledge led to the development of a framework which extends beyond the implicit-explicit dichotomy to four descriptive categories of knowledge along this dimension. Overall, the results demonstrated that explicit and implicit knowledge both contribute to the acquisition ofprogramming skills. Suggestions are made for further research, and the results are discussed in the context of their implications for education.
How Mentor Identity Evolves: Findings From a 10-Year Follow-Up Study of a National Professional Development Program.

PubMed

Balmer, Dorene F; Darden, Alix; Chandran, Latha; D'Alessandro, Donna; Gusic, Maryellen E

2018-02-20

Despite academic medicine's endorsement of professional development and mentoring, little is known about what junior faculty learn about mentoring in the implicit curriculum of professional development programs, and how their mentor identity evolves in this context. The authors explored what faculty-participants in the Educational Scholars Program implicitly learned about mentoring and how the implicit curriculum affected mentor identity transformation. Semi-structured interviews with 19 of 36 former faculty-participants were conducted in 2016. Consistent with constructivist grounded theory, data collection and analysis overlapped. The authors created initial codes informed by Ibarra's model for identity transformation, iteratively revised codes based on patterns in incoming data, and created visual representations of relationships amongst codes in order to gain a holistic and shared understanding of the data. In the implicit curriculum, faculty-participants learned the importance of having multiple mentors, the value of peer mentors, and the incremental process of becoming a mentor. The authors used Ibarra's model to understand how the implicit curriculum worked to transform mentor identity: faculty-participants reported observing mentors, experimenting with different ways to mentor and to be a mentor, and evaluating themselves as mentors. The Educational Scholars Program's implicit curriculum facilitated faculty-participants taking on a mentor identity via opportunities it afforded to watch mentors, experiment with mentoring, and evaluate self as mentor, key ingredients for professional identity construction. Leaders of professional development programs can develop faculty as mentors by capitalizing on what faculty-participants learn in the implicit curriculum and deliberately structuring post-graduation mentoring opportunities.
Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning.

PubMed

McDougle, Samuel D; Bond, Krista M; Taylor, Jordan A

2015-07-01

A popular model of human sensorimotor learning suggests that a fast process and a slow process work in parallel to produce the canonical learning curve (Smith et al., 2006). Recent evidence supports the subdivision of sensorimotor learning into explicit and implicit processes that simultaneously subserve task performance (Taylor et al., 2014). We set out to test whether these two accounts of learning processes are homologous. Using a recently developed method to assay explicit and implicit learning directly in a sensorimotor task, along with a computational modeling analysis, we show that the fast process closely resembles explicit learning and the slow process approximates implicit learning. In addition, we provide evidence for a subdivision of the slow/implicit process into distinct manifestations of motor memory. We conclude that the two-state model of motor learning is a close approximation of sensorimotor learning, but it is unable to describe adequately the various implicit learning operations that forge the learning curve. Our results suggest that a wider net be cast in the search for the putative psychological mechanisms and neural substrates underlying the multiplicity of processes involved in motor learning. Copyright © 2015 the authors 0270-6474/15/359568-12$15.00/0.
Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning

PubMed Central

Bond, Krista M.; Taylor, Jordan A.

2015-01-01

A popular model of human sensorimotor learning suggests that a fast process and a slow process work in parallel to produce the canonical learning curve (Smith et al., 2006). Recent evidence supports the subdivision of sensorimotor learning into explicit and implicit processes that simultaneously subserve task performance (Taylor et al., 2014). We set out to test whether these two accounts of learning processes are homologous. Using a recently developed method to assay explicit and implicit learning directly in a sensorimotor task, along with a computational modeling analysis, we show that the fast process closely resembles explicit learning and the slow process approximates implicit learning. In addition, we provide evidence for a subdivision of the slow/implicit process into distinct manifestations of motor memory. We conclude that the two-state model of motor learning is a close approximation of sensorimotor learning, but it is unable to describe adequately the various implicit learning operations that forge the learning curve. Our results suggest that a wider net be cast in the search for the putative psychological mechanisms and neural substrates underlying the multiplicity of processes involved in motor learning. PMID:26134640
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver

NASA Astrophysics Data System (ADS)

Chacon, Luis

2008-11-01

Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
Change in explicit and implicit motivation toward physical activity and sedentary behavior in pulmonary rehabilitation and associations with postrehabilitation behaviors.

PubMed

Chevance, Guillaume; Héraud, Nelly; Varray, Alain; Boiché, Julie

2017-05-01

The aim of this study was twofold: (a) to determine whether Theory of Planned Behavior (TPB) variables and implicit attitudes toward physical activity and sedentary behavior would change during a 5-week pulmonary rehabilitation (PR) program, and (b) to investigate the relationships between behavioral intentions, implicit attitudes, physical activity, and sedentary behavior in postrehabilitation. Out of 142 patients with respiratory disease included in this study, 119 completed 2 questionnaires measuring TPB variables with regard to physical activity and sedentary behavior, and an Implicit Association Test (IAT) measuring implicit attitudes toward physical activity in contrast to sedentary behavior. The TPB questionnaires and the IAT were administered at the beginning (Time 1) and the end of the program (Time 2). Six months after the program (Time 3), 62 patients provided self-reported measures of their recreational physical activity and screen-based, leisure-time sedentary behavior. Over the course of pulmonary rehabilitation, perceived behavioral control and intentions toward physical activity increased, as did social norms and perceived behavioral control toward sedentary behavior; implicit attitudes were also more positive toward physical activity. Implicit attitudes at the end of PR (Time 2) were significantly associated with postrehabilitation physical activity (Time 3). TPB variables toward physical activity and sedentary behavior as well as implicit attitudes were enhanced during PR. At 6 months, implicit attitudes were significantly associated with physical activity. These results suggest that motivation, particularly implicit attitudes, should be targeted in future behavioral interventions in order to optimize the effects of rehabilitation on physical activity maintenance. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Combustion of hydrogen injected into a supersonic airstream (a guide to the HISS computer program)

NASA Technical Reports Server (NTRS)

Dyer, D. F.; Maples, G.; Spalding, D. B.

1976-01-01

A computer program based on a finite-difference, implicit numerical integration scheme is described for the prediction of hydrogen injected into a supersonic airstream at an angle ranging from normal to parallel to the airstream main flow direction. Results of calculations for flow and thermal property distributions were compared with 'cold flow data' taken by NASA/Langley and show excellent correlation. Typical results for equilibrium combustion are presented and exhibit qualitatively plausible behavior. Computer time required for a given case is approximately one minute on a CDC 7600. A discussion of the assumption of parabolic flow in the injection region is given which demonstrates that improvement in calculation in this region could be obtained by a partially-parabolic procedure which has been developed. It is concluded that the technique described provides an efficient and reliable means for analyzing hydrogen injection into supersonic airstreams and the subsequent combustion.
Thermal Ablation Modeling for Silicate Materials

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq

2016-01-01

A thermal ablation model for silicates is proposed. The model includes the mass losses through the balance between evaporation and condensation, and through the moving molten layer driven by surface shear force and pressure gradient. This model can be applied in ablation simulations of the meteoroid or glassy Thermal Protection Systems for spacecraft. Time-dependent axi-symmetric computations are performed by coupling the fluid dynamics code, Data-Parallel Line Relaxation program, with the material response code, Two-dimensional Implicit Thermal Ablation simulation program, to predict the mass lost rates and shape change. For model validation, the surface recession of fused amorphous quartz rod is computed, and the recession predictions reasonably agree with available data. The present parametric studies for two groups of meteoroid earth entry conditions indicate that the mass loss through moving molten layer is negligibly small for heat-flux conditions at around 1 MW/cm(exp. 2).
Combustion of hydrogen injected into a supersonic airstream (the SHIP computer program)

NASA Technical Reports Server (NTRS)

Markatos, N. C.; Spalding, D. B.; Tatchell, D. G.

1977-01-01

The mathematical and physical basis of the SHIP computer program which embodies a finite-difference, implicit numerical procedure for the computation of hydrogen injected into a supersonic airstream at an angle ranging from normal to parallel to the airstream main flow direction is described. The physical hypotheses built into the program include: a two-equation turbulence model, and a chemical equilibrium model for the hydrogen-oxygen reaction. Typical results for equilibrium combustion are presented and exhibit qualitatively plausible behavior. The computer time required for a given case is approximately 1 minute on a CDC 7600 machine. A discussion of the assumption of parabolic flow in the injection region is given which suggests that improvement in calculation in this region could be obtained by use of the partially parabolic procedure of Pratap and Spalding. It is concluded that the technique described herein provides the basis for an efficient and reliable means for predicting the effects of hydrogen injection into supersonic airstreams and of its subsequent combustion.
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Element-topology-independent preconditioners for parallel finite element computations

NASA Technical Reports Server (NTRS)

Park, K. C.; Alexander, Scott

1992-01-01

A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
Development and Verification of the Charring, Ablating Thermal Protection Implicit System Simulator

NASA Technical Reports Server (NTRS)

Amar, Adam J.; Calvert, Nathan; Kirk, Benjamin S.

2011-01-01

The development and verification of the Charring Ablating Thermal Protection Implicit System Solver (CATPISS) is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method (FEM) with first and second order fully implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton s method, while the linear system is solved via the Generalized Minimum Residual method (GMRES). Verification results from exact solutions and Method of Manufactured Solutions (MMS) are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.
ZettaBricks: A Language Compiler and Runtime System for Anyscale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amarasinghe, Saman

This grant supported the ZettaBricks and OpenTuner projects. ZettaBricks is a new implicitly parallel language and compiler where defining multiple implementations of multiple algorithms to solve a problem is the natural way of programming. ZettaBricks makes algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The ZettaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, ZettaBricks introduces novel techniques to autotune algorithms for differentmore » convergence criteria. When choosing between various direct and iterative methods, the ZettaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice. OpenTuner is a generalization of the experience gained in building an autotuner for ZettaBricks. OpenTuner is a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests.« less
What is "the patient perspective" in patient engagement programs? Implicit logics and parallels to feminist theories.

PubMed

Rowland, Paula; McMillan, Sarah; McGillicuddy, Patti; Richards, Joy

2017-01-01

Public and patient involvement (PPI) in health care may refer to many different processes, ranging from participating in decision-making about one's own care to participating in health services research, health policy development, or organizational reforms. Across these many forms of public and patient involvement, the conceptual and theoretical underpinnings remain poorly articulated. Instead, most public and patient involvement programs rely on policy initiatives as their conceptual frameworks. This lack of conceptual clarity participates in dilemmas of program design, implementation, and evaluation. This study contributes to the development of theoretical understandings of public and patient involvement. In particular, we focus on the deployment of patient engagement programs within health service organizations. To develop a deeper understanding of the conceptual underpinnings of these programs, we examined the concept of "the patient perspective" as used by patient engagement practitioners and participants. Specifically, we focused on the way this phrase was used in the singular: "the" patient perspective or "the" patient voice. From qualitative analysis of interviews with 20 patient advisers and 6 staff members within a large urban health network in Canada, we argue that "the patient perspective" is referred to as a particular kind of situated knowledge, specifically an embodied knowledge of vulnerability. We draw parallels between this logic of patient perspective and the logic of early feminist theory, including the concepts of standpoint theory and strong objectivity. We suggest that champions of patient engagement may learn much from the way feminist theorists have constructed their arguments and addressed critique.
Discrete Diffusion Monte Carlo for Electron Thermal Transport

NASA Astrophysics Data System (ADS)

Chenhall, Jeffrey; Cao, Duc; Wollaeger, Ryan; Moses, Gregory

2014-10-01

The iSNB (implicit Schurtz Nicolai Busquet electron thermal transport method of Cao et al. is adapted to a Discrete Diffusion Monte Carlo (DDMC) solution method for eventual inclusion in a hybrid IMC-DDMC (Implicit Monte Carlo) method. The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the iSNB-DDMC method will be presented. This work was supported by Sandia National Laboratory - Albuquerque.
Development of an Implicit, Charge and Energy Conserving 2D Electromagnetic PIC Code on Advanced Architectures

NASA Astrophysics Data System (ADS)

Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel

2012-10-01

In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.
Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Abbott, Stephen

2015-11-01

Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.
Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

Padovan, Joe

1995-01-01

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Multiple grid problems on concurrent-processing computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.

1986-01-01

Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.
Continuous-time ΣΔ ADC with implicit variable gain amplifier for CMOS image sensor.

PubMed

Tang, Fang; Bermak, Amine; Abbes, Amira; Benammar, Mohieddine Amor

2014-01-01

This paper presents a column-parallel continuous-time sigma delta (CTSD) ADC for mega-pixel resolution CMOS image sensor (CIS). The sigma delta modulator is implemented with a 2nd order resistor/capacitor-based loop filter. The first integrator uses a conventional operational transconductance amplifier (OTA), for the concern of a high power noise rejection. The second integrator is realized with a single-ended inverter-based amplifier, instead of a standard OTA. As a result, the power consumption is reduced, without sacrificing the noise performance. Moreover, the variable gain amplifier in the traditional column-parallel read-out circuit is merged into the front-end of the CTSD modulator. By programming the input resistance, the amplitude range of the input current can be tuned with 8 scales, which is equivalent to a traditional 2-bit preamplification function without consuming extra power and chip area. The test chip prototype is fabricated using 0.18 μm CMOS process and the measurement result shows an ADC power consumption lower than 63.5 μW under 1.4 V power supply and 50 MHz clock frequency.
Perceptual other-race training reduces implicit racial bias.

PubMed

Lebrecht, Sophie; Pierce, Lara J; Tarr, Michael J; Tanaka, James W

2009-01-01

Implicit racial bias denotes socio-cognitive attitudes towards other-race groups that are exempt from conscious awareness. In parallel, other-race faces are more difficult to differentiate relative to own-race faces--the "Other-Race Effect." To examine the relationship between these two biases, we trained Caucasian subjects to better individuate other-race faces and measured implicit racial bias for those faces both before and after training. Two groups of Caucasian subjects were exposed equally to the same African American faces in a training protocol run over 5 sessions. In the individuation condition, subjects learned to discriminate between African American faces. In the categorization condition, subjects learned to categorize faces as African American or not. For both conditions, both pre- and post-training we measured the Other-Race Effect using old-new recognition and implicit racial biases using a novel implicit social measure--the "Affective Lexical Priming Score" (ALPS). Subjects in the individuation condition, but not in the categorization condition, showed improved discrimination of African American faces with training. Concomitantly, subjects in the individuation condition, but not the categorization condition, showed a reduction in their ALPS. Critically, for the individuation condition only, the degree to which an individual subject's ALPS decreased was significantly correlated with the degree of improvement that subject showed in their ability to differentiate African American faces. Our results establish a causal link between the Other-Race Effect and implicit racial bias. We demonstrate that training that ameliorates the perceptual Other-Race Effect also reduces socio-cognitive implicit racial bias. These findings suggest that implicit racial biases are multifaceted, and include malleable perceptual skills that can be modified with relatively little training.
Block Preconditioning to Enable Physics-Compatible Implicit Multifluid Plasma Simulations

NASA Astrophysics Data System (ADS)

Phillips, Edward; Shadid, John; Cyr, Eric; Miller, Sean

2017-10-01

Multifluid plasma simulations involve large systems of partial differential equations in which many time-scales ranging over many orders of magnitude arise. Since the fastest of these time-scales may set a restrictively small time-step limit for explicit methods, the use of implicit or implicit-explicit time integrators can be more tractable for obtaining dynamics at time-scales of interest. Furthermore, to enforce properties such as charge conservation and divergence-free magnetic field, mixed discretizations using volume, nodal, edge-based, and face-based degrees of freedom are often employed in some form. Together with the presence of stiff modes due to integrating over fast time-scales, the mixed discretization makes the required linear solves for implicit methods particularly difficult for black box and monolithic solvers. This work presents a block preconditioning strategy for multifluid plasma systems that segregates the linear system based on discretization type and approximates off-diagonal coupling in block diagonal Schur complement operators. By employing multilevel methods for the block diagonal subsolves, this strategy yields algorithmic and parallel scalability which we demonstrate on a range of problems.
Sierra/Solid Mechanics 4.48 User's Guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merewether, Mark Thomas; Crane, Nathan K; de Frias, Gabriel Jose

Sierra/SolidMechanics (Sierra/SM) is a Lagrangian, three-dimensional code for finite element analysis of solids and structures. It provides capabilities for explicit dynamic, implicit quasistatic and dynamic analyses. The explicit dynamics capabilities allow for the efficient and robust solution of models with extensive contact subjected to large, suddenly applied loads. For implicit problems, Sierra/SM uses a multi-level iterative solver, which enables it to effectively solve problems with large deformations, nonlinear material behavior, and contact. Sierra/SM has a versatile library of continuum and structural elements, and a large library of material models. The code is written for parallel computing environments enabling scalable solutionsmore » of extremely large problems for both implicit and explicit analyses. It is built on the SIERRA Framework, which facilitates coupling with other SIERRA mechanics codes. This document describes the functionality and input syntax for Sierra/SM.« less
Strategic processing in long-term repetition priming in the lexical decision task.

PubMed

Kessler, Yoav; Moscovitch, Morris

2013-04-01

In a lexical decision task, faster reaction times (RTs) for old than new items is taken as evidence for an implicit memory involvement in this task. In contrast, the present study shows the involvement of both implicit and explicit memory in repetition priming. We propose a dual route model, in which lexical decisions can be made using one of two parallel processing routes: a lexical route, in which the lexical properties of the stimulus are used to determine whether it is a word or not, and a strategic route that builds on the inherent correlation between "wordness" and "oldness" in the experiment. Eliminating the strategic route by removing this correlation diminishes the priming effect at the slow end of the RT distribution, but not at the fast end. This dissociation is interpreted as evidence for the involvement of both implicit and explicit memory in repetition priming.
A matrix-free implicit unstructured multigrid finite volume method for simulating structural dynamics and fluid structure interaction

NASA Astrophysics Data System (ADS)

Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.

2007-07-01

A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.
Group implicit concurrent algorithms in nonlinear structural dynamics

NASA Technical Reports Server (NTRS)

Ortiz, M.; Sotelino, E. D.

1989-01-01

During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers.

White Matter Microstructural Correlates of Superior Long-term Skill Gained Implicitly under Randomized Practice

PubMed Central

Song, Sunbin; Sharma, Nikhil; Buch, Ethan R.

2012-01-01

We value skills we have learned intentionally, but equally important are skills acquired incidentally without ability to describe how or what is learned, referred to as implicit. Randomized practice schedules are superior to grouped schedules for long-term skill gained intentionally, but its relevance for implicit learning is not known. In a parallel design, we studied healthy subjects who learned a motor sequence implicitly under randomized or grouped practice schedule and obtained diffusion-weighted images to identify white matter microstructural correlates of long-term skill. Randomized practice led to superior long-term skill compared with grouped practice. Whole-brain analyses relating interindividual variability in fractional anisotropy (FA) to long-term skill demonstrated that 1) skill in randomized learners correlated with FA within the corticostriatal tract connecting left sensorimotor cortex to posterior putamen, while 2) skill in grouped learners correlated with FA within the right forceps minor connecting homologous regions of the prefrontal cortex (PFC) and the corticostriatal tract connecting lateral PFC to anterior putamen. These results demonstrate first that randomized practice schedules improve long-term implicit skill more than grouped practice schedules and, second, that the superior skill acquired through randomized practice can be related to white matter microstructure in the sensorimotor corticostriatal network. PMID:21914632
A discrimination-association model for decomposing component processes of the implicit association test.

PubMed

Stefanutti, Luca; Robusto, Egidio; Vianello, Michelangelo; Anselmi, Pasquale

2013-06-01

A formal model is proposed that decomposes the implicit association test (IAT) effect into three process components: stimuli discrimination, automatic association, and termination criterion. Both response accuracy and reaction time are considered. Four independent and parallel Poisson processes, one for each of the four label categories of the IAT, are assumed. The model parameters are the rate at which information accrues on the counter of each process and the amount of information that is needed before a response is given. The aim of this study is to present the model and an illustrative application in which the process components of a Coca-Pepsi IAT are decomposed.
Racial Categorization Predicts Implicit Racial Bias in Preschool Children.

PubMed

Setoh, Peipei; Lee, Kristy J J; Zhang, Lijun; Qian, Miao K; Quinn, Paul C; Heyman, Gail D; Lee, Kang

2017-06-12

This research investigated the relation between racial categorization and implicit racial bias in majority and minority children. Chinese and Indian 3- to 7-year-olds from Singapore (N = 158) categorized Chinese and Indian faces by race and had their implicit and explicit racial biases measured. Majority Chinese children, but not minority Indian children, showed implicit bias favoring own race. Regardless of ethnicity, children's racial categorization performance correlated positively with implicit racial bias. Also, Chinese children, but not Indian children, displayed explicit bias favoring own race. Furthermore, children's explicit bias was unrelated to racial categorization performance and implicit bias. The findings support a perceptual-social linkage in the emergence of implicit racial bias and have implications for designing programs to promote interracial harmony. © 2017 The Authors. Child Development © 2017 Society for Research in Child Development, Inc.
Parallel Semi-Implicit Spectral Element Atmospheric Model

NASA Astrophysics Data System (ADS)

Fournier, A.; Thomas, S.; Loft, R.

2001-05-01

The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
CFD Analysis and Design Optimization Using Parallel Computers

NASA Technical Reports Server (NTRS)

Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James

1997-01-01

A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
MPSalsa Version 1.5: A Finite Element Computer Program for Reacting Flow Problems: Part 1 - Theoretical Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devine, K.D.; Hennigan, G.L.; Hutchinson, S.A.

1999-01-01

The theoretical background for the finite element computer program, MPSalsa Version 1.5, is presented in detail. MPSalsa is designed to solve laminar or turbulent low Mach number, two- or three-dimensional incompressible and variable density reacting fluid flows on massively parallel computers, using a Petrov-Galerkin finite element formulation. The code has the capability to solve coupled fluid flow (with auxiliary turbulence equations), heat transport, multicomponent species transport, and finite-rate chemical reactions, and to solve coupled multiple Poisson or advection-diffusion-reaction equations. The program employs the CHEMKIN library to provide a rigorous treatment of multicomponent ideal gas kinetics and transport. Chemical reactions occurringmore » in the gas phase and on surfaces are treated by calls to CHEMKIN and SURFACE CHEMK3N, respectively. The code employs unstructured meshes, using the EXODUS II finite element database suite of programs for its input and output files. MPSalsa solves both transient and steady flows by using fully implicit time integration, an inexact Newton method and iterative solvers based on preconditioned Krylov methods as implemented in the Aztec. solver library.« less
A Framework for Integrating Implicit Bias Recognition Into Health Professions Education.

PubMed

Sukhera, Javeed; Watling, Chris

2018-01-01

Existing literature on implicit bias is fragmented and comes from a variety of fields like cognitive psychology, business ethics, and higher education, but implicit-bias-informed educational approaches have been underexplored in health professions education and are difficult to evaluate using existing tools. Despite increasing attention to implicit bias recognition and management in health professions education, many programs struggle to meaningfully integrate these topics into curricula. The authors propose a six-point actionable framework for integrating implicit bias recognition and management into health professions education that draws on the work of previous researchers and includes practical tools to guide curriculum developers. The six key features of this framework are creating a safe and nonthreatening learning context, increasing knowledge about the science of implicit bias, emphasizing how implicit bias influences behaviors and patient outcomes, increasing self-awareness of existing implicit biases, improving conscious efforts to overcome implicit bias, and enhancing awareness of how implicit bias influences others. Important considerations for designing implicit-bias-informed curricula-such as individual and contextual variables, as well as formal and informal cultural influences-are discussed. The authors also outline assessment and evaluation approaches that consider outcomes at individual, organizational, community, and societal levels. The proposed framework may facilitate future research and exploration regarding the use of implicit bias in health professions education.
Can a continuum solvent model reproduce the free energy landscape of a -hairpin folding in water?

NASA Astrophysics Data System (ADS)

Zhou, Ruhong; Berne, Bruce J.

2002-10-01

The folding free energy landscape of the C-terminal -hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the -hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native -strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this -hairpin. Furthermore, the -hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and 80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields.
Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

PubMed Central

Zhou, Ruhong; Berne, Bruce J.

2002-01-01

The folding free energy landscape of the C-terminal β-hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the β-hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native β-strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this β-hairpin. Furthermore, the β-hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and ≈80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields. PMID:12242327
Can a continuum solvent model reproduce the free energy landscape of a beta -hairpin folding in water?

PubMed

Zhou, Ruhong; Berne, Bruce J

2002-10-01

The folding free energy landscape of the C-terminal beta-hairpin of protein G is explored using the surface-generalized Born (SGB) implicit solvent model, and the results are compared with the landscape from an earlier study with explicit solvent model. The OPLSAA force field is used for the beta-hairpin in both implicit and explicit solvent simulations, and the conformational space sampling is carried out with a highly parallel replica-exchange method. Surprisingly, we find from exhaustive conformation space sampling that the free energy landscape from the implicit solvent model is quite different from that of the explicit solvent model. In the implicit solvent model some nonnative states are heavily overweighted, and more importantly, the lowest free energy state is no longer the native beta-strand structure. An overly strong salt-bridge effect between charged residues (E42, D46, D47, E56, and K50) is found to be responsible for this behavior in the implicit solvent model. Despite this, we find that the OPLSAA/SGB energies of all the nonnative structures are higher than that of the native structure; thus the OPLSAA/SGB energy is still a good scoring function for structure prediction for this beta-hairpin. Furthermore, the beta-hairpin population at 282 K is found to be less than 40% from the implicit solvent model, which is much smaller than the 72% from the explicit solvent model and approximately equal 80% from experiment. On the other hand, both implicit and explicit solvent simulations with the OPLSAA force field exhibit no meaningful helical content during the folding process, which is in contrast to some very recent studies using other force fields.
Nonequilibrium thermo-chemical calculations using a diagonal implicit scheme

NASA Technical Reports Server (NTRS)

Imlay, Scott T.; Roberts, Donald W.; Soetrisno, Moeljo; Eberhardt, Scott

1991-01-01

A recently developed computer program for hypersonic vehicle flow analysis is described. The program uses a diagonal implicit algorithm to solve the equations of viscous flow for a gas in thermochemical nonequilibrium. The diagonal scheme eliminates the expense of inverting large block matrices that arise when species conservation equations are introduced. The program uses multiple zones of grids patched together and includes radiation wall and rarefied gas boundary conditions. Solutions are presented for hypersonic flows of air and hydrogen air mixtures.
IMPLICIT DUAL CONTROL BASED ON PARTICLE FILTERING AND FORWARD DYNAMIC PROGRAMMING.

PubMed

Bayard, David S; Schumitzky, Alan

2010-03-01

This paper develops a sampling-based approach to implicit dual control. Implicit dual control methods synthesize stochastic control policies by systematically approximating the stochastic dynamic programming equations of Bellman, in contrast to explicit dual control methods that artificially induce probing into the control law by modifying the cost function to include a term that rewards learning. The proposed implicit dual control approach is novel in that it combines a particle filter with a policy-iteration method for forward dynamic programming. The integration of the two methods provides a complete sampling-based approach to the problem. Implementation of the approach is simplified by making use of a specific architecture denoted as an H-block. Practical suggestions are given for reducing computational loads within the H-block for real-time applications. As an example, the method is applied to the control of a stochastic pendulum model having unknown mass, length, initial position and velocity, and unknown sign of its dc gain. Simulation results indicate that active controllers based on the described method can systematically improve closed-loop performance with respect to other more common stochastic control approaches.
Application of a lower-upper implicit scheme and an interactive grid generation for turbomachinery flow field simulations

NASA Technical Reports Server (NTRS)

Choo, Yung K.; Soh, Woo-Yung; Yoon, Seokkwan

1989-01-01

A finite-volume lower-upper (LU) implicit scheme is used to simulate an inviscid flow in a tubine cascade. This approximate factorization scheme requires only the inversion of sparse lower and upper triangular matrices, which can be done efficiently without extensive storage. As an implicit scheme it allows a large time step to reach the steady state. An interactive grid generation program (TURBO), which is being developed, is used to generate grids. This program uses the control point form of algebraic grid generation which uses a sparse collection of control points from which the shape and position of coordinate curves can be adjusted. A distinct advantage of TURBO compared with other grid generation programs is that it allows the easy change of local mesh structure without affecting the grid outside the domain of independence. Sample grids are generated by TURBO for a compressor rotor blade and a turbine cascade. The turbine cascade flow is simulated by using the LU implicit scheme on the grid generated by TURBO.
On the performance of explicit and implicit algorithms for transient thermal analysis

NASA Astrophysics Data System (ADS)

Adelman, H. M.; Haftka, R. T.

1980-09-01

The status of an effort to increase the efficiency of calculating transient temperature fields in complex aerospace vehicle structures is described. The advantages and disadvantages of explicit and implicit algorithms are discussed. A promising set of implicit algorithms, known as the GEAR package is described. Four test problems, used for evaluating and comparing various algorithms, have been selected and finite element models of the configurations are discribed. These problems include a space shuttle frame component, an insulated cylinder, a metallic panel for a thermal protection system and a model of the space shuttle orbiter wing. Calculations were carried out using the SPAR finite element program, the MITAS lumped parameter program and a special purpose finite element program incorporating the GEAR algorithms. Results generally indicate a preference for implicit over explicit algorithms for solution of transient structural heat transfer problems when the governing equations are stiff. Careful attention to modeling detail such as avoiding thin or short high-conducting elements can sometimes reduce the stiffness to the extent that explicit methods become advantageous.
Advancing parabolic operators in thermodynamic MHD models: Explicit super time-stepping versus implicit schemes with Krylov solvers

NASA Astrophysics Data System (ADS)

Caplan, R. M.; Mikić, Z.; Linker, J. A.; Lionello, R.

2017-05-01

We explore the performance and advantages/disadvantages of using unconditionally stable explicit super time-stepping (STS) algorithms versus implicit schemes with Krylov solvers for integrating parabolic operators in thermodynamic MHD models of the solar corona. Specifically, we compare the second-order Runge-Kutta Legendre (RKL2) STS method with the implicit backward Euler scheme computed using the preconditioned conjugate gradient (PCG) solver with both a point-Jacobi and a non-overlapping domain decomposition ILU0 preconditioner. The algorithms are used to integrate anisotropic Spitzer thermal conduction and artificial kinematic viscosity at time-steps much larger than classic explicit stability criteria allow. A key component of the comparison is the use of an established MHD model (MAS) to compute a real-world simulation on a large HPC cluster. Special attention is placed on the parallel scaling of the algorithms. It is shown that, for a specific problem and model, the RKL2 method is comparable or surpasses the implicit method with PCG solvers in performance and scaling, but suffers from some accuracy limitations. These limitations, and the applicability of RKL methods are briefly discussed.
Compiled MPI: Cost-Effective Exascale Applications Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G; Quinlan, D; Lumsdaine, A

2012-04-10

The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
Mapping implicit spectral methods to distributed memory architectures

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Vanrosendale, John

1991-01-01

Spectral methods were proven invaluable in numerical simulation of PDEs (Partial Differential Equations), but the frequent global communication required raises a fundamental barrier to their use on highly parallel architectures. To explore this issue, a 3-D implicit spectral method was implemented on an Intel hypercube. Utilization of about 50 percent was achieved on a 32 node iPSC/860 hypercube, for a 64 x 64 x 64 Fourier-spectral grid; finer grids yield higher utilizations. Chebyshev-spectral grids are more problematic, since plane-relaxation based multigrid is required. However, by using a semicoarsening multigrid algorithm, and by relaxing all multigrid levels concurrently, relatively high utilizations were also achieved in this harder case.
Interprofessional Collaboration and Turf Wars How Prevalent Are Hidden Attitudes?*

PubMed Central

Chung, Chadwick L. R.; Manga, Jasmin; McGregor, Marion; Michailidis, Christos; Stavros, Demetrios; Woodhouse, Linda J.

2012-01-01

Purpose: Interprofessional collaboration in health care is believed to enhance patient outcomes. However, where professions have overlapping scopes of practice (eg, chiropractors and physical therapists), "turf wars" can hinder effective collaboration. Deep-rooted beliefs, identified as implicit attitudes, provide a potential explanation. Even with positive explicit attitudes toward a social group, negative stereotypes may be influential. Previous studies on interprofessional attitudes have mostly used qualitative research methodologies. This study used quantitative methods to evaluate explicit and implicit attitudes of physical therapy students toward chiropractic. Methods: A paper-and-pencil instrument was developed and administered to 49 individuals (students and faculty) associated with a Canadian University master's entry-level physical therapy program after approval by the Research Ethics Board. The instrument evaluated explicit and implicit attitudes toward the chiropractic profession. Implicit attitudes were determined by comparing response times of chiropractic paired with positive versus negative descriptors. Results: Mean time to complete a word association task was significantly longer (t = 4.75, p =.00) when chiropractic was associated with positive rather than negative words. Explicit and implicit attitudes were not correlated (r = 0.13, p =.38). Conclusions: While little explicit bias existed, individuals associated with a master's entry-level physical therapy program appeared to have a significant negative implicit bias toward chiropractic PMID:22778528
Interprofessional collaboration and turf wars how prevalent are hidden attitudes?

PubMed

Chung, Chadwick L R; Manga, Jasmin; McGregor, Marion; Michailidis, Christos; Stavros, Demetrios; Woodhouse, Linda J

2012-01-01

Interprofessional collaboration in health care is believed to enhance patient outcomes. However, where professions have overlapping scopes of practice (eg, chiropractors and physical therapists), "turf wars" can hinder effective collaboration. Deep-rooted beliefs, identified as implicit attitudes, provide a potential explanation. Even with positive explicit attitudes toward a social group, negative stereotypes may be influential. Previous studies on interprofessional attitudes have mostly used qualitative research methodologies. This study used quantitative methods to evaluate explicit and implicit attitudes of physical therapy students toward chiropractic. A paper-and-pencil instrument was developed and administered to 49 individuals (students and faculty) associated with a Canadian University master's entry-level physical therapy program after approval by the Research Ethics Board. The instrument evaluated explicit and implicit attitudes toward the chiropractic profession. Implicit attitudes were determined by comparing response times of chiropractic paired with positive versus negative descriptors. Mean time to complete a word association task was significantly longer (t = 4.75, p =.00) when chiropractic was associated with positive rather than negative words. Explicit and implicit attitudes were not correlated (r = 0.13, p =.38). While little explicit bias existed, individuals associated with a master's entry-level physical therapy program appeared to have a significant negative implicit bias toward chiropractic.
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

NASA Astrophysics Data System (ADS)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.

Independence between implicit and explicit processing as revealed by the Simon effect.

PubMed

Lo, Shih-Yu; Yeh, Su-Ling

2011-09-01

Studies showing human behavior influenced by subliminal stimuli mainly focus on implicit processing per se, and little is known about its interaction with explicit processing. We examined this by using the Simon effect, wherein a task-irrelevant spatial distracter interferes with lateralized response. Lo and Yeh (2008) found that the visual Simon effect, although it occurred when participants were aware of the visual distracters, did not occur with subliminal visual distracters. We used the same paradigm and examined whether subliminal and supra-threshold stimuli are processed independently by adding a supra-threshold auditory distracter to ascertain whether it would interact with the subliminal visual distracter. Results showed auditory Simon effect, but there was still no visual Simon effect, indicating that supra-threshold and subliminal stimuli are processed separately in independent streams. In contrast to the traditional view that implicit processing precedes explicit processing, our results suggest that they operate independently in a parallel fashion. Copyright © 2010 Elsevier Inc. All rights reserved.
Towards full-Braginskii implicit extended MHD

NASA Astrophysics Data System (ADS)

Chacon, Luis

2009-05-01

Recently, viable algorithms have been proposed for the scalable, fully-implicit temporal integration of 3D resistive MHD and cold-ion extended MHD models. While significant, these achievements must be tempered by the fact that such models lack predictive capabilities in regimes of interest for magnetic fusion. Short of including kinetic closures, a natural evolution path towards predictability starts by considering additional terms as described in Braginskii's fluid closures in the collisional regime. Here, we focus on the inclusion of two fundamental elements of relevance for fusion plasmas: anisotropic parallel electron transport, and warm-ion physics (i.e., ion finite Larmor radius effects, included via gyroviscosity). Both these elements introduce significant numerical difficulties, due to the strong anisotropy in the former, and the presence of dispersive waves in the latter. In this presentation, we will discuss progress in our fully implicit algorithmic formulation towards the inclusion of both these elements. L. Chac'on, Phys. Plasmas, 15, 056103 (2008) L. Chac'on, J. Physics: Conf. Series, 125, 012041 (2008)
Analysis of Implicit Uncertain Systems. Part 1: Theoretical Framework

DTIC Science & Technology

1994-12-07

Analysis of Implicit Uncertain Systems Part I: Theoretical Framework Fernando Paganini * John Doyle 1 December 7, 1994 Abst rac t This paper...Analysis of Implicit Uncertain Systems Part I: Theoretical Framework 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S...model and a number of constraints relevant to the analysis problem under consideration. In Part I of this paper we propose a theoretical framework which
Implicit Learning Abilities Predict Treatment Response in Autism Spectrum Disorders

DTIC Science & Technology

2015-09-01

2 AWARD NUMBER: W81XWH-14-1-0261 TITLE: Implicit Learning Abilities Predict Treatment Response in Autism Spectrum Disorders PRINCIPAL...Treatment Response in Autism Spectrum Disorders 5b. GRANT NUMBER W81XWH-14-1-0261 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER...for Autism Spectrum Disorder (ASD), but almost half of the children do not make significant gains. Implicit learning skills are integral to
Iran's Implicit Philosophy of Education

ERIC Educational Resources Information Center

Bagheri Noaparast, Khosrow

2018-01-01

This paper aims to extract Iran's philosophy of education from two sources of the constitution and the course of practice in educational institutions. Regarding the first source, it is argued that parallel to the two main threads of the constitution, Iran's main elements of philosophy of education are expected to be derived from; (1) Islam and (2)…
Advances in the spatially distributed ages-w model: parallel computation, java connection framework (JCF) integration, and streamflow/nitrogen dynamics assessment

USDA-ARS?s Scientific Manuscript database

AgroEcoSystem-Watershed (AgES-W) is a modular, Java-based spatially distributed model which implements hydrologic and water quality (H/WQ) simulation components under the Java Connection Framework (JCF) and the Object Modeling System (OMS) environmental modeling framework. AgES-W is implicitly scala...
Sleep Benefits in Parallel Implicit and Explicit Measures of Episodic Memory

ERIC Educational Resources Information Center

Weber, Frederik D.; Wang, Jing-Yi; Born, Jan; Inostroza, Marion

2014-01-01

Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual…
Verification and Planning Based on Coinductive Logic Programming

NASA Technical Reports Server (NTRS)

Bansal, Ajay; Min, Richard; Simon, Luke; Mallya, Ajay; Gupta, Gopal

2008-01-01

Coinduction is a powerful technique for reasoning about unfounded sets, unbounded structures, infinite automata, and interactive computations [6]. Where induction corresponds to least fixed point's semantics, coinduction corresponds to greatest fixed point semantics. Recently coinduction has been incorporated into logic programming and an elegant operational semantics developed for it [11, 12]. This operational semantics is the greatest fix point counterpart of SLD resolution (SLD resolution imparts operational semantics to least fix point based computations) and is termed co- SLD resolution. In co-SLD resolution, a predicate goal p( t) succeeds if it unifies with one of its ancestor calls. In addition, rational infinite terms are allowed as arguments of predicates. Infinite terms are represented as solutions to unification equations and the occurs check is omitted during the unification process. Coinductive Logic Programming (Co-LP) and Co-SLD resolution can be used to elegantly perform model checking and planning. A combined SLD and Co-SLD resolution based LP system forms the common basis for planning, scheduling, verification, model checking, and constraint solving [9, 4]. This is achieved by amalgamating SLD resolution, co-SLD resolution, and constraint logic programming [13] in a single logic programming system. Given that parallelism in logic programs can be implicitly exploited [8], complex, compute-intensive applications (planning, scheduling, model checking, etc.) can be executed in parallel on multi-core machines. Parallel execution can result in speed-ups as well as in larger instances of the problems being solved. In the remainder we elaborate on (i) how planning can be elegantly and efficiently performed under real-time constraints, (ii) how real-time systems can be elegantly and efficiently model- checked, as well as (iii) how hybrid systems can be verified in a combined system with both co-SLD and SLD resolution. Implementations of co-SLD resolution as well as preliminary implementations of the planning and verification applications have been developed [4]. Co-LP and Model Checking: The vast majority of properties that are to be verified can be classified into safety properties and liveness properties. It is well known within model checking that safety properties can be verified by reachability analysis, i.e, if a counter-example to the property exists, it can be finitely determined by enumerating all the reachable states of the Kripke structure.
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, T.

Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
High quality NMR structures: a new force field with implicit water and membrane solvation for Xplor-NIH.

PubMed

Tian, Ye; Schwieters, Charles D; Opella, Stanley J; Marassi, Francesca M

2017-01-01

Structure determination of proteins by NMR is unique in its ability to measure restraints, very accurately, in environments and under conditions that closely mimic those encountered in vivo. For example, advances in solid-state NMR methods enable structure determination of membrane proteins in detergent-free lipid bilayers, and of large soluble proteins prepared by sedimentation, while parallel advances in solution NMR methods and optimization of detergent-free lipid nanodiscs are rapidly pushing the envelope of the size limit for both soluble and membrane proteins. These experimental advantages, however, are partially squandered during structure calculation, because the commonly used force fields are purely repulsive and neglect solvation, Van der Waals forces and electrostatic energy. Here we describe a new force field, and updated energy functions, for protein structure calculations with EEFx implicit solvation, electrostatics, and Van der Waals Lennard-Jones forces, in the widely used program Xplor-NIH. The new force field is based primarily on CHARMM22, facilitating calculations with a wider range of biomolecules. The new EEFx energy function has been rewritten to enable OpenMP parallelism, and optimized to enhance computation efficiency. It implements solvation, electrostatics, and Van der Waals energy terms together, thus ensuring more consistent and efficient computation of the complete nonbonded energy lists. Updates in the related python module allow detailed analysis of the interaction energies and associated parameters. The new force field and energy function work with both soluble proteins and membrane proteins, including those with cofactors or engineered tags, and are very effective in situations where there are sparse experimental restraints. Results obtained for NMR-restrained calculations with a set of five soluble proteins and five membrane proteins show that structures calculated with EEFx have significant improvements in accuracy, precision, and conformation, and that structure refinement can be obtained by short relaxation with EEFx to obtain improvements in these key metrics. These developments broaden the range of biomolecular structures that can be calculated with high fidelity from NMR restraints.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Scalable Implementation of Finite Elements by NASA _ Implicit (ScIFEi)

NASA Technical Reports Server (NTRS)

Warner, James E.; Bomarito, Geoffrey F.; Heber, Gerd; Hochhalter, Jacob D.

2016-01-01

Scalable Implementation of Finite Elements by NASA (ScIFEN) is a parallel finite element analysis code written in C++. ScIFEN is designed to provide scalable solutions to computational mechanics problems. It supports a variety of finite element types, nonlinear material models, and boundary conditions. This report provides an overview of ScIFEi (\\Sci-Fi"), the implicit solid mechanics driver within ScIFEN. A description of ScIFEi's capabilities is provided, including an overview of the tools and features that accompany the software as well as a description of the input and output le formats. Results from several problems are included, demonstrating the efficiency and scalability of ScIFEi by comparing to finite element analysis using a commercial code.
Multigrid treatment of implicit continuum diffusion

NASA Astrophysics Data System (ADS)

Francisquez, Manaure; Zhu, Ben; Rogers, Barrett

2017-10-01

Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

NASA Astrophysics Data System (ADS)

Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

2000-04-01

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Methodology of modeling and measuring computer architectures for plasma simulations

NASA Technical Reports Server (NTRS)

Wang, L. P. T.

1977-01-01

A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
A Comparison of Students' Conceptual Understanding of Electric Circuits in Simulation Only and Simulation-Laboratory Contexts

ERIC Educational Resources Information Center

Jaakkola, Tomi; Nurmi, Sami; Veermans, Koen

2011-01-01

The aim of this experimental study was to compare learning outcomes of students using a simulation alone (simulation environment) with outcomes of those using a simulation in parallel with real circuits (combination environment) in the domain of electricity, and to explore how learning outcomes in these environments are mediated by implicit (only…
Multiobjective Multifactorial Optimization in Evolutionary Multitasking.

PubMed

Gupta, Abhishek; Ong, Yew-Soon; Feng, Liang; Tan, Kay Chen

2016-05-03

In recent decades, the field of multiobjective optimization has attracted considerable interest among evolutionary computation researchers. One of the main features that makes evolutionary methods particularly appealing for multiobjective problems is the implicit parallelism offered by a population, which enables simultaneous convergence toward the entire Pareto front. While a plethora of related algorithms have been proposed till date, a common attribute among them is that they focus on efficiently solving only a single optimization problem at a time. Despite the known power of implicit parallelism, seldom has an attempt been made to multitask, i.e., to solve multiple optimization problems simultaneously. It is contended that the notion of evolutionary multitasking leads to the possibility of automated transfer of information across different optimization exercises that may share underlying similarities, thereby facilitating improved convergence characteristics. In particular, the potential for automated transfer is deemed invaluable from the standpoint of engineering design exercises where manual knowledge adaptation and reuse are routine. Accordingly, in this paper, we present a realization of the evolutionary multitasking paradigm within the domain of multiobjective optimization. The efficacy of the associated evolutionary algorithm is demonstrated on some benchmark test functions as well as on a real-world manufacturing process design problem from the composites industry.
Efficiency and flexibility using implicit methods within atmosphere dycores

NASA Astrophysics Data System (ADS)

Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.

2016-12-01

A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

NASA Technical Reports Server (NTRS)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.

Solution of partial differential equations on vector and parallel computers

NASA Technical Reports Server (NTRS)

Ortega, J. M.; Voigt, R. G.

1985-01-01

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Mental Health Trainees' Explicit and Implicit Attitudes Toward Transracial Adoptive Families Headed by Lesbian, Gay, and Heterosexual Couples.

PubMed

Tan, Tony Xing; Jordan-Arthur, Brittany; Garafano, Jeffrey S; Curran, Laura

2017-01-01

We investigated 109 (79.8% female; 76% White, and 83.5% Heterosexual) mental health trainees' explicit and implicit attitudes toward heterosexual, lesbian, and gay White couples adopting and raising Black children. To determine explicit attitudes, we used a vignette depicting a Black child ready for adoption and three types of equally qualified White families who were headed by a heterosexual couple, gay couple, or lesbian couple. The trainees were asked to indicate which type of family they preferred to adopt the child. To determine implicit attitudes, we used the computer programed latency-based multifactor implicit association test (IAT) protocol. The IAT data were collected from each participant individually. Explicit data showed that over 80% of the participants indicated no strong preference in terms of which type of family should adopted the child. However, IAT data showed that the trainees implicitly preferred lesbian couples. Overall, the degree of congruence between explicit and implicit was very low. Implications for training were discussed.
Effect of Implicit Perceptual-Motor Training on Decision-Making Skills and Underpinning Gaze Behavior in Combat Athletes.

PubMed

Milazzo, Nicolas; Farrow, Damian; Fournier, Jean F

2016-08-01

This study investigated the effect of a 12-session, implicit perceptual-motor training program on decision-making skills and visual search behavior of highly skilled junior female karate fighters (M age = 15.7 years, SD = 1.2). Eighteen participants were required to make (physical or verbal) reaction decisions to various attacks within different fighting scenarios. Fighters' performance and eye movements were assessed before and after the intervention, and during acquisition through the use of video-based and on-mat decision-making tests. The video-based test revealed that following training, only the implicit perceptual-motor group (n = 6) improved their decision-making accuracy significantly compared to a matched motor training (placebo, n = 6) group and a control group (n = 6). Further, the implicit training group significantly changed their visual search behavior by focusing on fewer locations for longer durations. In addition, the session-by-session analysis showed no significant improvement in decision accuracy between training session 1 and all the other sessions, except the last one. Coaches should devote more practice time to implicit learning approaches during perceptual-motor training program to achieve significant decision-making improvements and more efficient visual search strategy with elite athletes. © The Author(s) 2016.
Is abstinence education theory based? The underlying logic of abstinence education programs in Texas.

PubMed

Goodson, Patricia; Pruitt, B E; Suther, Sandy; Wilson, Kelly; Buhi, Eric

2006-04-01

Authors examined the logic (or the implicit theory) underlying 16 abstinence-only-until-marriage programs in Texas (50% of all programs funded under the federal welfare reform legislation during 2001 and 2002). Defined as a set of propositions regarding the relationship between program activities and their intended outcomes, program staff's implicit theories were summarized and compared to (a) data from studies on adolescent sexual behavior, (b) a theory-based model of youth abstinent behavior, and (c) preliminary findings from the national evaluation of Title V programs. Authors interviewed 62 program directors and instructors and employed selected principles of grounded theory to analyze interview data. Findings indicated that abstinence education staff could clearly articulate the logic guiding program activity choices. Comparisons between interview data and a theory-based model of adolescent sexual behavior revealed striking similarities. Implications of these findings for conceptualizing and evaluating abstinence-only-until-marriage (or similar) programs are examined.
Photochemical numerics for global-scale modeling: Fidelity and GCM testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elliott, S.; Jim Kao, Chih-Yue; Zhao, X.

1995-03-01

Atmospheric photochemistry lies at the heart of global-scale pollution problems, but it is a nonlinear system embedded in nonlinear transport and so must be modeled in three dimensions. Total earth grids are massive and kinetics require dozens of interacting tracers, taxing supercomputers to their limits in global calculations. A matrix-free and noniterative family scheme is described that permits chemical step sizes an order of magnitude or more larger than time constants for molecular groupings, in the 1-h range used for transport. Families are partitioned through linearized implicit integrations that produce stabilizing species concentrations for a mass-conserving forward solver. The kineticsmore » are also parallelized by moving geographic loops innermost and changes in the continuity equations are automated through list reading. The combination of speed, parallelization and automation renders the programs naturally modular. Accuracy lies within 1% for all species in week-long fidelity tests. A 50-species, 150-reaction stratospheric module tested in a spectral GCM benchmarks at 10 min CPU time per day and agrees with lower-dimensionality simulations. Tropospheric nonmethane hydrocarbon chemistry will soon be added, and inherently three-dimensional phenomena will be investigated both decoupled from dynamics and in a complete chemical GCM. 225 refs., 11 figs., 2 tabs.« less
MrBayes tgMC3++: A High Performance and Resource-Efficient GPU-Oriented Phylogenetic Analysis Method.

PubMed

Ling, Cheng; Hamada, Tsuyoshi; Gao, Jingyang; Zhao, Guoguang; Sun, Donghong; Shi, Weifeng

2016-01-01

MrBayes is a widespread phylogenetic inference tool harnessing empirical evolutionary models and Bayesian statistics. However, the computational cost on the likelihood estimation is very expensive, resulting in undesirably long execution time. Although a number of multi-threaded optimizations have been proposed to speed up MrBayes, there are bottlenecks that severely limit the GPU thread-level parallelism of likelihood estimations. This study proposes a high performance and resource-efficient method for GPU-oriented parallelization of likelihood estimations. Instead of having to rely on empirical programming, the proposed novel decomposition storage model implements high performance data transfers implicitly. In terms of performance improvement, a speedup factor of up to 178 can be achieved on the analysis of simulated datasets by four Tesla K40 cards. In comparison to the other publicly available GPU-oriented MrBayes, the tgMC 3 ++ method (proposed herein) outperforms the tgMC 3 (v1.0), nMC 3 (v2.1.1) and oMC 3 (v1.00) methods by speedup factors of up to 1.6, 1.9 and 2.9, respectively. Moreover, tgMC 3 ++ supports more evolutionary models and gamma categories, which previous GPU-oriented methods fail to take into analysis.
["Anything goes"?: the implicit dialogue between Paul Feyerabend and two Brazilian researchers, Maurício da Rocha e Silva and Newton Freire-Maia].

PubMed

Bastos, Francisco Inácio

2010-03-01

The philosopher Paul Feyerabend and Brazilian scientists Maurício da Rocha e Silva and Newton Freire-Maia were contemporaries and lived surrounded by the fundamental dilemnas of science. The anarchist proposal of Feyerabend, then embryonic, was formulated in parallel by Rocha e Silva in his criticism of the scientific method. Two decades later, Feyerabend's ideas seemed implicitly to stimulate Newton Freire-Maia in his reflections on science. The web of interrelationships in the ideas of these three men - who never interacted - touches on central issues for Brazilian science from 1960 to 1980, a period in which the latter is consolidated in a dialogue with the nascent reflection on science and the scientific method in Brazil.
Giftedness in Arabic Environments: Concepts, Implicit Theories, and the Contributed Factors in the Enrichment Programs

ERIC Educational Resources Information Center

Aljughaiman, Abdullah M.; Ayoub, Alaa Eldin A.

2017-01-01

The study aimed at identifying specific giftedness patterns that teachers discriminate against, and for, when nominating gifted students and focused on the identification of implicit theories adopted by teachers on the topics of intelligence, giftedness, and creativity in light of their specialization and experience. The study examined the…
The Implicit Curriculum Survey: An Examination of the Psychometric Properties

ERIC Educational Resources Information Center

Grady, Melissa D.; Swick, Danielle C.; Powers, Joelle D.

2018-01-01

This study examined the psychometric properties of the Implicit Curriculum Survey (ICS) using an exploratory factor analysis (EFA). Students enrolled in four different MSW programs (N = 262) from different geographic locations completed the ICS, which is a Web-based survey. The domains of the ICS include field, academics, community, diversity,…
Sleep benefits in parallel implicit and explicit measures of episodic memory.

PubMed

Weber, Frederik D; Wang, Jing-Yi; Born, Jan; Inostroza, Marion

2014-03-14

Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual exploration and verbal recall as, respectively, implicit and explicit measures of memory, to study effects of sleep on episodic memory consolidation in humans. During encoding before 10-h retention intervals that covered nighttime sleep or daytime wakefulness, two groups of young adults were presented with two episodes that were 1-h apart. Each episode entailed a spatial configuration of four different faces in a 3 × 3 grid of locations. After the retention interval, implicit spatio-temporal recall performance was assessed by eye-tracking visual exploration of another configuration of four faces of which two were from the first and second episode, respectively; of the two faces one was presented at the same location as during encoding and the other at another location. Afterward explicit verbal recall was assessed. Measures of implicit and explicit episodic memory retention were positively correlated (r = 0.57, P < 0.01), and were both better after nighttime sleep than daytime wakefulness (P < 0.05). In the sleep group, implicit episodic memory recall was associated with increased fast spindles during nonrapid eye movement (NonREM) sleep (r = 0.62, P < 0.05). Together with concordant observations in rats our results indicate that consolidation of genuinely episodic memory benefits from sleep.
Sleep benefits in parallel implicit and explicit measures of episodic memory

PubMed Central

Weber, Frederik D.; Wang, Jing-Yi; Born, Jan; Inostroza, Marion

2014-01-01

Research in rats using preferences during exploration as a measure of memory has indicated that sleep is important for the consolidation of episodic-like memory, i.e., memory for an event bound into specific spatio-temporal context. How these findings relate to human episodic memory is unclear. We used spontaneous preferences during visual exploration and verbal recall as, respectively, implicit and explicit measures of memory, to study effects of sleep on episodic memory consolidation in humans. During encoding before 10-h retention intervals that covered nighttime sleep or daytime wakefulness, two groups of young adults were presented with two episodes that were 1-h apart. Each episode entailed a spatial configuration of four different faces in a 3 × 3 grid of locations. After the retention interval, implicit spatio-temporal recall performance was assessed by eye-tracking visual exploration of another configuration of four faces of which two were from the first and second episode, respectively; of the two faces one was presented at the same location as during encoding and the other at another location. Afterward explicit verbal recall was assessed. Measures of implicit and explicit episodic memory retention were positively correlated (r = 0.57, P < 0.01), and were both better after nighttime sleep than daytime wakefulness (P < 0.05). In the sleep group, implicit episodic memory recall was associated with increased fast spindles during nonrapid eye movement (NonREM) sleep (r = 0.62, P < 0.05). Together with concordant observations in rats our results indicate that consolidation of genuinely episodic memory benefits from sleep. PMID:24634354
[Supercomputer investigation of the protein-ligand system low-energy minima].

PubMed

Oferkin, I V; Sulimov, A V; Katkova, E V; Kutov, D K; Grigoriev, F V; Kondakova, O A; Sulimov, V B

2015-01-01

The accuracy of the protein-ligand binding energy calculations and ligand positioning is strongly influenced by the choice of the docking target function. This work demonstrates the evaluation of the five different target functions used in docking: functions based on MMFF94 force field and functions based on PM7 quantum-chemical method accounting or without accounting the implicit solvent model (PCM, COSMO or SGB). For these purposes the ligand positions corresponding to the minima of the target function and the experimentally known ligand positions in the protein active site (crystal ligand positions) were compared. Each function was examined on the same test-set of 16 protein-ligand complexes. The new parallelized docking program FLM based on Monte Carlo search algorithm was developed to perform the comprehensive low-energy minima search and to calculate the protein-ligand binding energy. This study demonstrates that the docking target function based on the MMFF94 force field can be used to detect the crystal or near crystal positions of the ligand by the finding the low-energy local minima spectrum of the target function. The importance of solvent accounting in the docking process for the accurate ligand positioning is also shown. The accuracy of the ligand positioning as well as the correlation between the calculated and experimentally determined protein-ligand binding energies are improved when the MMFF94 force field is substituted by the new PM7 method with implicit solvent accounting.
Numerical studies of unsteady two dimensional subsonic flows using the ICE method. Ph.D. Thesis - Toledo Univ.

NASA Technical Reports Server (NTRS)

Wieber, P. R.

1973-01-01

A numerical program was developed to compute transient compressible and incompressible laminar flows in two dimensions with multicomponent mixing and chemical reaction. The algorithm used the Los Alamos Scientific Laboratory ICE (Implicit Continuous-Fluid Eulerian) method as its base. The program can compute both high and low speed compressible flows. The numerical program incorporating the stabilization techniques was quite successful in treating both old and new problems. Detailed calculations of coaxial flow very close to the entry plane were possible. The program treated complex flows such as the formation and downstream growth of a recirculation cell. An implicit solution of the species equation predicted mixing and reaction rates which compared favorably with the literature.
An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures

NASA Technical Reports Server (NTRS)

Cohl, H.

1994-01-01

We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.
Reactive Transport Modeling of Induced Calcite Precipitation Reaction Fronts in Porous Media Using A Parallel, Fully Coupled, Fully Implicit Approach

NASA Astrophysics Data System (ADS)

Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.

2010-12-01

Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
3D Gaussian Beam Modeling

DTIC Science & Technology

2011-09-01

optimized building blocks such as a parallelized tri-diagonal linear solver (used in the “implicit finite differences ” and split-step Pade PE models...and Ding Lee. “A finite - difference treatment of interface conditions for the parabolic wave equation: The horizontal interface.” The Journal of the...Acoustical Society of America, 71(4):855, 1982. 3. Ding Lee and Suzanne T. McDaniel. “A finite - difference treatment of interface conditions for
Task 7: ADPAC User's Manual

NASA Technical Reports Server (NTRS)

Hall, E. J.; Topp, D. A.; Delaney, R. A.

1996-01-01

The overall objective of this study was to develop a 3-D numerical analysis for compressor casing treatment flowfields. The current version of the computer code resulting from this study is referred to as ADPAC (Advanced Ducted Propfan Analysis Codes-Version 7). This report is intended to serve as a computer program user's manual for the ADPAC code developed under Tasks 6 and 7 of the NASA Contract. The ADPAC program is based on a flexible multiple- block grid discretization scheme permitting coupled 2-D/3-D mesh block solutions with application to a wide variety of geometries. Aerodynamic calculations are based on a four-stage Runge-Kutta time-marching finite volume solution technique with added numerical dissipation. Steady flow predictions are accelerated by a multigrid procedure. An iterative implicit algorithm is available for rapid time-dependent flow calculations, and an advanced two equation turbulence model is incorporated to predict complex turbulent flows. The consolidated code generated during this study is capable of executing in either a serial or parallel computing mode from a single source code. Numerous examples are given in the form of test cases to demonstrate the utility of this approach for predicting the aerodynamics of modem turbomachinery configurations.
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

NASA Astrophysics Data System (ADS)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Learning Nature of Science Concepts through a Research Apprenticeship Program: A Comparative Study of Three Approaches

ERIC Educational Resources Information Center

Burgin, Stephen R.; Sadler, Troy D.

2016-01-01

The merits of three approaches (explicit, reflective and implicit) to Nature of Science (NOS) teaching and learning in the context of a summer research experience on high school student participants' NOS ideas were explored in this study. The effectiveness of explicit over implicit approaches has been demonstrated in school contexts, but less…

An Optimized Multicolor Point-Implicit Solver for Unstructured Grid Applications on Graphics Processing Units

NASA Technical Reports Server (NTRS)

Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana

2016-01-01

In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
A scalable, fully implicit algorithm for the reduced two-field low-β extended MHD model

DOE PAGES

Chacon, Luis; Stanier, Adam John

2016-12-01

Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Application of Parallel Time-Implicit Discontinuous Galerkin Finite Element Methods to Hypersonic Nonequilibrium Flow Problems

DTIC Science & Technology

2014-05-01

heating prediction to grid alignment along the shock . . . . . . . . 36 1-12 Large variation in heating predictions for 3D hypersonic flow over cylinder...100 4-12 Taylor Vortex problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4-13 Taylor Vortex problem: 3D ...149 6-16 3D contours for temperature, T for MIG and US3D for only O2 test case . . . . 150 6-17 Stagnation line plots for only
Application of the Hughes-LIU algorithm to the 2-dimensional heat equation

NASA Technical Reports Server (NTRS)

Malkus, D. S.; Reichmann, P. I.; Haftka, R. T.

1982-01-01

An implicit explicit algorithm for the solution of transient problems in structural dynamics is described. The method involved dividing the finite elements into implicit and explicit groups while automatically satisfying the conditions. This algorithm is applied to the solution of the linear, transient, two dimensional heat equation subject to an initial condition derived from the soluton of a steady state problem over an L-shaped region made up of a good conductor and an insulating material. Using the IIT/PRIME computer with virtual memory, a FORTRAN computer program code was developed to make accuracy, stability, and cost comparisons among the fully explicit Euler, the Hughes-Liu, and the fully implicit Crank-Nicholson algorithms. The Hughes-Liu claim that the explicit group governs the stability of the entire region while maintaining the unconditional stability of the implicit group is illustrated.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Analysis of composite ablators using massively parallel computation

NASA Technical Reports Server (NTRS)

Shia, David

1995-01-01

In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
Global magnetosphere simulations using constrained-transport Hall-MHD with CWENO reconstruction

NASA Astrophysics Data System (ADS)

Lin, L.; Germaschewski, K.; Maynard, K. M.; Abbott, S.; Bhattacharjee, A.; Raeder, J.

2013-12-01

We present a new CWENO (Centrally-Weighted Essentially Non-Oscillatory) reconstruction based MHD solver for the OpenGGCM global magnetosphere code. The solver was built using libMRC, a library for creating efficient parallel PDE solvers on structured grids. The use of libMRC gives us access to its core functionality of providing an automated code generation framework which takes a user provided PDE right hand side in symbolic form to generate an efficient, computer architecture specific, parallel code. libMRC also supports block-structured adaptive mesh refinement and implicit-time stepping through integration with the PETSc library. We validate the new CWENO Hall-MHD solver against existing solvers both in standard test problems as well as in global magnetosphere simulations.
Highly Parallel Alternating Directions Algorithm for Time Dependent Problems

NASA Astrophysics Data System (ADS)

Ganzha, M.; Georgiev, K.; Lirkov, I.; Margenov, S.; Paprzycki, M.

2011-11-01

In our work, we consider the time dependent Stokes equation on a finite time interval and on a uniform rectangular mesh, written in terms of velocity and pressure. For this problem, a parallel algorithm based on a novel direction splitting approach is developed. Here, the pressure equation is derived from a perturbed form of the continuity equation, in which the incompressibility constraint is penalized in a negative norm induced by the direction splitting. The scheme used in the algorithm is composed of two parts: (i) velocity prediction, and (ii) pressure correction. This is a Crank-Nicolson-type two-stage time integration scheme for two and three dimensional parabolic problems in which the second-order derivative, with respect to each space variable, is treated implicitly while the other variable is made explicit at each time sub-step. In order to achieve a good parallel performance the solution of the Poison problem for the pressure correction is replaced by solving a sequence of one-dimensional second order elliptic boundary value problems in each spatial direction. The parallel code is implemented using the standard MPI functions and tested on two modern parallel computer systems. The performed numerical tests demonstrate good level of parallel efficiency and scalability of the studied direction-splitting-based algorithm.
Dual-processing accounts of reasoning, judgment, and social cognition.

PubMed

Evans, Jonathan St B T

2008-01-01

This article reviews a diverse set of proposals for dual processing in higher cognition within largely disconnected literatures in cognitive and social psychology. All these theories have in common the distinction between cognitive processes that are fast, automatic, and unconscious and those that are slow, deliberative, and conscious. A number of authors have recently suggested that there may be two architecturally (and evolutionarily) distinct cognitive systems underlying these dual-process accounts. However, it emerges that (a) there are multiple kinds of implicit processes described by different theorists and (b) not all of the proposed attributes of the two kinds of processing can be sensibly mapped on to two systems as currently conceived. It is suggested that while some dual-process theories are concerned with parallel competing processes involving explicit and implicit knowledge systems, others are concerned with the influence of preconscious processes that contextualize and shape deliberative reasoning and decision-making.
The Role of AP and the Composition Program.

ERIC Educational Resources Information Center

Mahala, Daniel; Vivion, Michael

1993-01-01

Suggests that most programs have not based their acceptance of advanced placement credit on reasoned endorsement of the views of language, literature, and rhetoric that AP exams present. Criticizes the views implicit in the AP program and shows how they conflict with the goals of one particular college composition program. (RS)
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
BOOK REVIEW: Advanced Topics in Computational Partial Differential Equations: Numerical Methods and Diffpack Programming

NASA Astrophysics Data System (ADS)

Katsaounis, T. D.

2005-02-01

The scope of this book is to present well known simple and advanced numerical methods for solving partial differential equations (PDEs) and how to implement these methods using the programming environment of the software package Diffpack. A basic background in PDEs and numerical methods is required by the potential reader. Further, a basic knowledge of the finite element method and its implementation in one and two space dimensions is required. The authors claim that no prior knowledge of the package Diffpack is required, which is true, but the reader should be at least familiar with an object oriented programming language like C++ in order to better comprehend the programming environment of Diffpack. Certainly, a prior knowledge or usage of Diffpack would be a great advantage to the reader. The book consists of 15 chapters, each one written by one or more authors. Each chapter is basically divided into two parts: the first part is about mathematical models described by PDEs and numerical methods to solve these models and the second part describes how to implement the numerical methods using the programming environment of Diffpack. Each chapter closes with a list of references on its subject. The first nine chapters cover well known numerical methods for solving the basic types of PDEs. Further, programming techniques on the serial as well as on the parallel implementation of numerical methods are also included in these chapters. The last five chapters are dedicated to applications, modelled by PDEs, in a variety of fields. The first chapter is an introduction to parallel processing. It covers fundamentals of parallel processing in a simple and concrete way and no prior knowledge of the subject is required. Examples of parallel implementation of basic linear algebra operations are presented using the Message Passing Interface (MPI) programming environment. Here, some knowledge of MPI routines is required by the reader. Examples solving in parallel simple PDEs using Diffpack and MPI are also presented. Chapter 2 presents the overlapping domain decomposition method for solving PDEs. It is well known that these methods are suitable for parallel processing. The first part of the chapter covers the mathematical formulation of the method as well as algorithmic and implementational issues. The second part presents a serial and a parallel implementational framework within the programming environment of Diffpack. The chapter closes by showing how to solve two application examples with the overlapping domain decomposition method using Diffpack. Chapter 3 is a tutorial about how to incorporate the multigrid solver in Diffpack. The method is illustrated by examples such as a Poisson solver, a general elliptic problem with various types of boundary conditions and a nonlinear Poisson type problem. In chapter 4 the mixed finite element is introduced. Technical issues concerning the practical implementation of the method are also presented. The main difficulties of the efficient implementation of the method, especially in two and three space dimensions on unstructured grids, are presented and addressed in the framework of Diffpack. The implementational process is illustrated by two examples, namely the system formulation of the Poisson problem and the Stokes problem. Chapter 5 is closely related to chapter 4 and addresses the problem of how to solve efficiently the linear systems arising by the application of the mixed finite element method. The proposed method is block preconditioning. Efficient techniques for implementing the method within Diffpack are presented. Optimal block preconditioners are used to solve the system formulation of the Poisson problem, the Stokes problem and the bidomain model for the electrical activity in the heart. The subject of chapter 6 is systems of PDEs. Linear and nonlinear systems are discussed. Fully implicit and operator splitting methods are presented. Special attention is paid to how existing solvers for scalar equations in Diffpack can be used to derive fully implicit solvers for systems. The proposed techniques are illustrated in terms of two applications, namely a system of PDEs modelling pipeflow and a two-phase porous media flow. Stochastic PDEs is the topic of chapter 7. The first part of the chapter is a simple introduction to stochastic PDEs; basic analytical properties are presented for simple models like transport phenomena and viscous drag forces. The second part considers the numerical solution of stochastic PDEs. Two basic techniques are presented, namely Monte Carlo and perturbation methods. The last part explains how to implement and incorporate these solvers into Diffpack. Chapter 8 describes how to operate Diffpack from Python scripts. The main goal here is to provide all the programming and technical details in order to glue the programming environment of Diffpack with visualization packages through Python and in general take advantage of the Python interfaces. Chapter 9 attempts to show how to use numerical experiments to measure the performance of various PDE solvers. The authors gathered a rather impressive list, a total of 14 PDE solvers. Solvers for problems like Poisson, Navier--Stokes, elasticity, two-phase flows and methods such as finite difference, finite element, multigrid, and gradient type methods are presented. The authors provide a series of numerical results combining various solvers with various methods in order to gain insight into their computational performance and efficiency. In Chapter 10 the authors consider a computationally challenging problem, namely the computation of the electrical activity of the human heart. After a brief introduction on the biology of the problem the authors present the mathematical models involved and a numerical method for solving them within the framework of Diffpack. Chapter 11 and 12 are closely related; actually they could have been combined in a single chapter. Chapter 11 introduces several mathematical models used in finance, based on the Black--Scholes equation. Chapter 12 considers several numerical methods like Monte Carlo, lattice methods, finite difference and finite element methods. Implementation of these methods within Diffpack is presented in the last part of the chapter. Chapter 13 presents how the finite element method is used for the modelling and analysis of elastic structures. The authors describe the structural elements of Diffpack which include popular elements such as beams and plates and examples are presented on how to use them to simulate elastic structures. Chapter 14 describes an application problem, namely the extrusion of aluminum. This is a rather\\endcolumn complicated process which involves non-Newtonian flow, heat transfer and elasticity. The authors describe the systems of PDEs modelling the underlying process and use a finite element method to obtain a numerical solution. The implementation of the numerical method in Diffpack is presented along with some applications. The last chapter, chapter 15, focuses on mathematical and numerical models of systems of PDEs governing geological processes in sedimentary basins. The underlying mathematical model is solved using the finite element method within a fully implicit scheme. The authors discuss the implementational issues involved within Diffpack and they present results from several examples. In summary, the book focuses on the computational and implementational issues involved in solving partial differential equations. The potential reader should have a basic knowledge of PDEs and the finite difference and finite element methods. The examples presented are solved within the programming framework of Diffpack and the reader should have prior experience with the particular software in order to take full advantage of the book. Overall the book is well written, the subject of each chapter is well presented and can serve as a reference for graduate students, researchers and engineers who are interested in the numerical solution of partial differential equations modelling various applications.
Application of the θ-method to a telegraphic model of fluid flow in a dual-porosity medium

NASA Astrophysics Data System (ADS)

González-Calderón, Alfredo; Vivas-Cruz, Luis X.; Herrera-Hernández, Erik César

2018-01-01

This work focuses mainly on the study of numerical solutions, which are obtained using the θ-method, of a generalized Warren and Root model that includes a second-order wave-like equation in its formulation. The solutions approximately describe the single-phase hydraulic head in fractures by considering the finite velocity of propagation by means of a Cattaneo-like equation. The corresponding discretized model is obtained by utilizing a non-uniform grid and a non-uniform time step. A simple relationship is proposed to give the time-step distribution. Convergence is analyzed by comparing results from explicit, fully implicit, and Crank-Nicolson schemes with exact solutions: a telegraphic model of fluid flow in a single-porosity reservoir with relaxation dynamics, the Warren and Root model, and our studied model, which is solved with the inverse Laplace transform. We find that the flux and the hydraulic head have spurious oscillations that most often appear in small-time solutions but are attenuated as the solution time progresses. Furthermore, we show that the finite difference method is unable to reproduce the exact flux at time zero. Obtaining results for oilfield production times, which are in the order of months in real units, is only feasible using parallel implicit schemes. In addition, we propose simple parallel algorithms for the memory flux and for the explicit scheme.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Explicit and implicit anti-fat attitudes in children and their relationships with their body images.

PubMed

Solbes, Irene; Enesco, Ileana

2010-02-01

This study aimed to explore the prevalence of negative attitudes toward overweight peers among children using different explicit and implicit measures, and to analyze their relationships with some aspects of their body image. A total of 120 children aged 6-11 years were interviewed using a computer program that simulated a game containing several tasks. Specifically, we have applied multiple measures of explicit attitudes toward average-weight/overweight peers, several personal body attitudes questions and a child-oriented version of the Implicit Association Test. Our participants showed important prejudice and stereotypes against overweight children, both at the explicit and implicit levels. However, we found important differences in the intensity of prejudice and its developmental course as a function of the tasks and the type of measurement used to assess it. Children who grow up in Western societies idealize thinness from an early age and denigrate overweight, to which they associate explicitly and implicitly a series of negative traits that have nothing to do with the weight. As they grow older, they seem to reduce their levels of explicit prejudice, but not the intensity of implicit bias. More research is needed to study in depth prejudice and discrimination toward overweight children from a developmental point of view. Copyright 2010 S. Karger AG, Basel.
Explicit and implicit effects of anti-marijuana and anti-tobacco TV advertisements.

PubMed

Czyzewska, Maria; Ginsburg, Harvey J

2007-01-01

Effects of anti-tobacco and anti-marijuana TV advertisements on explicit (i.e., semantic differential ratings) and implicit (i.e. Implicit Association Test, IAT) attitudes toward tobacco and marijuana were compared. Two hundred twenty nine, 18- to 19-year-old U.S. college students were randomly assigned to anti-tobacco or anti-marijuana PSA viewing conditions. Participants completed a short survey on attitudes to tobacco and marijuana. Afterwards they watched 15 PSAs embedded in a 15-min science program. At the end, all participants completed IAT for marijuana, IAT for tobacco and the assessment of explicit attitudes. Results of ANCOVA revealed a significant interaction between type of TV PSAs watched and implicit attitudes, F(1,223)=7.12, p<0.01 when controlling for preexisting attitudes to both substances; the implicit attitudes were more negative toward the substance that corresponded to the content of advertisements watched (i.e., anti-tobacco or anti-marijuana). However, analogical analysis on explicit measures showed that attitudes to marijuana became less negative among students that watched anti-marijuana ads than the group with anti-tobacco ads, F(1,222)=5.79, p<0.02. The discussion focused on the practical and theoretical implications of the observed dissociation between implicit and explicit attitudes to marijuana after the exposure to anti-marijuana PSAs.
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields

NASA Astrophysics Data System (ADS)

Forsythe, Victoriya V.; Makarevich, Roman A.

2016-11-01

An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

NASA Technical Reports Server (NTRS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

Intrinsic interactive reinforcement learning - Using error-related potentials for real world human-robot interaction.

PubMed

Kim, Su Kyoung; Kirchner, Elsa Andrea; Stefes, Arne; Kirchner, Frank

2017-12-14

Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
The metaphysics of D-CTCs: On the underlying assumptions of Deutsch's quantum solution to the paradoxes of time travel

NASA Astrophysics Data System (ADS)

Dunlap, Lucas

2016-11-01

I argue that Deutsch's model for the behavior of systems traveling around closed timelike curves (CTCs) relies implicitly on a substantive metaphysical assumption. Deutsch is employing a version of quantum theory with a significantly supplemented ontology of parallel existent worlds, which differ in kind from the many worlds of the Everett interpretation. Standard Everett does not support the existence of multiple identical copies of the world, which the D-CTC model requires. This has been obscured because he often refers to the branching structure of Everett as a "multiverse", and describes quantum interference by reference to parallel interacting definite worlds. But he admits that this is only an approximation to Everett. The D-CTC model, however, relies crucially on the existence of a multiverse of parallel interacting worlds. Since his model is supplemented by structures that go significantly beyond quantum theory, and play an ineliminable role in its predictions and explanations, it does not represent a quantum solution to the paradoxes of time travel.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination

NASA Astrophysics Data System (ADS)

Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo

2018-01-01

We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Implementation of a partitioned algorithm for simulation of large CSI problems

NASA Technical Reports Server (NTRS)

Alvin, Kenneth F.; Park, K. C.

1991-01-01

The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Fully-Implicit Navier-Stokes (FIN-S)

NASA Technical Reports Server (NTRS)

Kirk, Benjamin S.

2010-01-01

FIN-S is a SUPG finite element code for flow problems under active development at NASA Lyndon B. Johnson Space Center and within PECOS: a) The code is built on top of the libMesh parallel, adaptive finite element library. b) The initial implementation of the code targeted supersonic/hypersonic laminar calorically perfect gas flows & conjugate heat transfer. c) Initial extension to thermochemical nonequilibrium about 9 months ago. d) The technologies in FIN-S have been enhanced through a strongly collaborative research effort with Sandia National Labs.
Parallel Tempering of Dark Matter from the Ebola Virus Proteome: Comparison of CHARMM36m and CHARMM22 Force Fields with Implicit Solvent and Coarse Grained Model

DTIC Science & Technology

2017-08-10

simulation models the conformational plasticity along the helix-forming reaction coordinate was limited by free - energy barriers. By comparison the coarse...revealed. The latter becomes evident in comparing the energy Z-score landscapes , where CHARMM22 simulation shows a manifold of shuttling...solvent simulations of calculating the charging free energy of protein conformations.33 Deviation to the protocol by modification of Born radii
Numerical simulation of h-adaptive immersed boundary method for freely falling disks

NASA Astrophysics Data System (ADS)

Zhang, Pan; Xia, Zhenhua; Cai, Qingdong

2018-05-01

In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.
Comparison of Implicit Collocation Methods for the Heat Equation

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)

2001-01-01

We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.
Calculation of transonic aileron buzz

NASA Technical Reports Server (NTRS)

Steger, J. L.; Bailey, H. E.

1979-01-01

An implicit finite-difference computer code that uses a two-layer algebraic eddy viscosity model and exact geometric specification of the airfoil has been used to simulate transonic aileron buzz. The calculated results, which were performed on both the Illiac IV parallel computer processor and the Control Data 7600 computer, are in essential agreement with the original expository wind-tunnel data taken in the Ames 16-Foot Wind Tunnel just after World War II. These results and a description of the pertinent numerical techniques are included.
Study Abroad Programs as Tools of Internationalization: Which Factors Influence Hungarian Business Students to Participate?

ERIC Educational Resources Information Center

Huják, Janka

2015-01-01

The internationalization of higher education has been on the agenda for decades now all over the world. Study abroad programs are undoubtedly tools of the internationalization endeavors. The ERASMUS Student Mobility Program is one of the flagships of the European Union's educational exchange programs implicitly aiming for the internationalization…
The Impact of Modernization Programs on Academic Teachers' Work: A Mexican Case Study

ERIC Educational Resources Information Center

Zavala, Blanca Arciga

2006-01-01

For more than ten years, academics of public universities in Mexico have endured modernization programs that promote individual productivity and operate as a mechanism of selection and assessment. The implementation of the programs has exposed a tension between the values implicit in the programs and the values of the academic teachers. There is a…
The identification of implicit theories in domestic violence perpetrators.

PubMed

Dempsey, Bernadette; Day, Andrew

2011-05-01

An understanding of how the beliefs of domestically violent offenders might influence their abusive behavior is central to the development and delivery of any intervention program that aims to reduce the risk of further violence against women and children. This article reports the results of a preliminary investigation into the core beliefs of a sample of domestically violent men. Three major themes emerged from an analysis of the accounts of their violence, which were understood in relation to three implicit theories that participants held about themselves, their relationships, and the world. These are discussed in terms of previous studies of offender cognition, how domestic violence programs might be conceptualized, and their implications for practice.
Ablation, Thermal Response, and Chemistry Program for Analysis of Thermal Protection Systems

NASA Technical Reports Server (NTRS)

Milos, Frank S.; Chen, Yih-Kanq

2010-01-01

In previous work, the authors documented the Multicomponent Ablation Thermochemistry (MAT) and Fully Implicit Ablation and Thermal response (FIAT) programs. In this work, key features from MAT and FIAT were combined to create the new Fully Implicit Ablation, Thermal response, and Chemistry (FIATC) program. FIATC is fully compatible with FIAT (version 2.5) but has expanded capabilities to compute the multispecies surface chemistry and ablation rate as part of the surface energy balance. This new methodology eliminates B' tables, provides blown species fractions as a function of time, and enables calculations that would otherwise be impractical (e.g. 4+ dimensional tables) such as pyrolysis and ablation with kinetic rates or unequal diffusion coefficients. Equations and solution procedures are presented, then representative calculations of equilibrium and finite-rate ablation in flight and ground-test environments are discussed.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Implicit bias and its relation to health disparities: a teaching program and survey of medical students.

PubMed

Gonzalez, Cristina M; Kim, Mimi Y; Marantz, Paul R

2014-01-01

The varying treatment of different patients by the same physician are referred to as within provider disparities. These differences can contribute to health disparities and are thought to be the result of implicit bias due to unintentional, unconscious assumptions. The purpose is to describe an educational intervention addressing both health disparities and physician implicit bias and the results of a subsequent survey exploring medical students' attitudes and beliefs toward subconscious bias and health disparities. A single session within a larger required course was devoted to health disparities and the physician's potential to contribute to health disparities through implicit bias. Following the session the students were anonymously surveyed on their Implicit Association Test (IAT) results, their attitudes and experiences regarding the fairness of the health care system, and the potential impact of their own implicit bias. The students were categorized based on whether they disagreed ("deniers") or agreed ("accepters") with the statement "Unconscious bias might affect some of my clinical decisions or behaviors." Data analysis focused specifically on factors associated with this perspective. The survey response rate was at least 69%. Of the responders, 22% were "deniers" and 77% were "accepters." Demographics between the two groups were not significantly different. Deniers were significantly more likely than accepters to report IAT results with implicit preferences toward self, to believe the IAT is invalid, and to believe that doctors and the health system provide equal care to all and were less likely to report having directly observed inequitable care. The recognition of bias cannot be taught in a single session. Our experience supports the value of teaching medical students to recognize their own implicit biases and develop skills to overcome them in each patient encounter, and in making this instruction part of the compulsory, longitudinal undergraduate medical curriculum.
Male Perpetrators of Intimate Partner Violence and Implicit Attitudes toward Violence: Associations with Treatment Outcomes

PubMed Central

Eckhardt, Christopher I.; Crane, Cory A.

2014-01-01

The present study examined the associations among implicit attitudes toward factors related to intimate partner violence (IPV) and objective, behavioral outcomes of participants legally mandated to attend partner violence interventions. Twenty-six male offenders, adjudicated within the past month on IPV charges, completed three sets of gender and violence themed implicit associations tests (IATs) to evaluate the relationships between implicit evaluations of women and violence and three key outcome measures assessed six months after enrollment in the study: self-reported prior year IPV perpetration, completion of a court-mandated partner abuse program, and criminal reoffending. IAT results indicated that more rapid associations between violence-related words and positive valences, rather than gender evaluations or associations between gender and violence, were associated with greater IPV perpetration during the year prior to involvement in the study as well as with poorer outcomes (i.e., greater treatment non-compliance and criminal recidivism) at the 6-month follow-up. Among explicit measures, only negative partner violence outcome expectancies were marginally associated with treatment compliance. None of the explicit measures predicted previous violence or recidivism. The findings are discussed in the context of reducing violence through promoting implicit cognitive change. PMID:25598562
Implicit Messages to Teen-Aged Viewers.

ERIC Educational Resources Information Center

Abrahamsson, Ulla B.

Examples from the data of a study of television programing for adolescents in Sweden illustrate some of the differences in the ways programs address their male and female viewers. Whereas boy and girl characters in television programs are roughly equal in number, the distribution changes when only leading roles are considered. A marked imbalance…
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.

Practical aspects of prestack depth migration with finite differences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ober, C.C.; Oldfield, R.A.; Womble, D.E.

1997-07-01

Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Optimization of groundwater artificial recharge systems using a genetic algorithm: a case study in Beijing, China

NASA Astrophysics Data System (ADS)

Hao, Qichen; Shao, Jingli; Cui, Yali; Zhang, Qiulan; Huang, Linxian

2018-05-01

An optimization approach is used for the operation of groundwater artificial recharge systems in an alluvial fan in Beijing, China. The optimization model incorporates a transient groundwater flow model, which allows for simulation of the groundwater response to artificial recharge. The facilities' operation with regard to recharge rates is formulated as a nonlinear programming problem to maximize the volume of surface water recharged into the aquifers under specific constraints. This optimization problem is solved by the parallel genetic algorithm (PGA) based on OpenMP, which could substantially reduce the computation time. To solve the PGA with constraints, the multiplicative penalty method is applied. In addition, the facilities' locations are implicitly determined on the basis of the results of the recharge-rate optimizations. Two scenarios are optimized and the optimal results indicate that the amount of water recharged into the aquifers will increase without exceeding the upper limits of the groundwater levels. Optimal operation of this artificial recharge system can also contribute to the more effective recovery of the groundwater storage capacity.
FINIFLUX: An implicit finite element model for quantification of groundwater fluxes and hyporheic exchange in streams and rivers using radon

NASA Astrophysics Data System (ADS)

Frei, S.; Gilfedder, B. S.

2015-08-01

A quantitative understanding of groundwater-surface water interactions is vital for sustainable management of water quantity and quality. The noble gas radon-222 (Rn) is becoming increasingly used as a sensitive tracer to quantify groundwater discharge to wetlands, lakes, and rivers: a development driven by technical and methodological advances in Rn measurement. However, quantitative interpretation of these data is not trivial, and the methods used to date are based on the simplest solutions to the mass balance equation (e.g., first-order finite difference and inversion). Here we present a new implicit numerical model (FINIFLUX) based on finite elements for quantifying groundwater discharge to streams and rivers using Rn surveys at the reach scale (1-50 km). The model is coupled to a state-of-the-art parameter optimization code Parallel-PEST to iteratively solve the mass balance equation for groundwater discharge and hyporheic exchange. The major benefit of this model is that it is programed to be very simple to use, reduces nonuniqueness, and provides numerically stable estimates of groundwater fluxes and hyporheic residence times from field data. FINIFLUX was tested against an analytical solution and then implemented on two German rivers of differing magnitude, the Salzach (˜112 m3 s-1) and the Rote Main (˜4 m3 s-1). We show that using previous inversion techniques numerical instability can lead to physically impossible negative values, whereas the new model provides stable positive values for all scenarios. We hope that by making FINIFLUX freely available to the community that Rn might find wider application in quantifying groundwater discharge to streams and rivers and thus assist in a combined management of surface and groundwater systems.
Flexible parallel implicit modelling of coupled thermal-hydraulic-mechanical processes in fractured rocks

NASA Astrophysics Data System (ADS)

Cacace, Mauro; Jacquey, Antoine B.

2017-09-01

Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture-solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment) which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton-Raphson or by free Jacobian inexact Newton-Krylow schemes) on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres) and temporal scales (from minutes to hundreds of years).
Parallel Computing of Upwelling in a Rotating Stratified Flow

NASA Astrophysics Data System (ADS)

Cui, A.; Street, R. L.

1997-11-01

A code for the three-dimensional, unsteady, incompressible, and turbulent flow has been implemented on the IBM SP2, using message passing. The effects of rotation and variable density are included. A finite volume method is used to discretize the Navier-Stokes equations in general curvilinear coordinates on a non-staggered grid. All the spatial derivatives are approximated using second-order central differences with the exception of the convection terms, which are handled with special upwind-difference schemes. The semi-implicit, second-order accurate, time-advancement scheme employs the Adams-Bashforth method for the explicit terms and Crank-Nicolson for the implicit terms. A multigrid method, with the four-color ZEBRA as smoother, is used to solve the Poisson equation for pressure, while the momentum equations are solved with an approximate factorization technique. The code was successfully validated for a variety test cases. Simulations of a laboratory model of coastal upwelling in a rotating annulus are in progress and will be presented.
IMEX HDG-DG: A coupled implicit hybridized discontinuous Galerkin and explicit discontinuous Galerkin approach for Euler systems on cubed sphere.

NASA Astrophysics Data System (ADS)

Kang, S.; Muralikrishnan, S.; Bui-Thanh, T.

2017-12-01

We propose IMEX HDG-DG schemes for Euler systems on cubed sphere. Of interest is subsonic flow, where the speed of the acoustic wave is faster than that of the nonlinear advection. In order to simulate these flows efficiently, we split the governing system into stiff part describing the fast waves and non-stiff part associated with nonlinear advection. The former is discretized implicitly with HDG method while explicit Runge-Kutta DG discretization is employed for the latter. The proposed IMEX HDG-DG framework: 1) facilitates high-order solution both in time and space; 2) avoids overly small time stepsizes; 3) requires only one linear system solve per time step; and 4) relatively to DG generates smaller and sparser linear system while promoting further parallelism owing to HDG discretization. Numerical results for various test cases demonstrate that our methods are comparable to explicit Runge-Kutta DG schemes in terms of accuracy, while allowing for much larger time stepsizes.
Time-asymptotic solutions of the Navier-Stokes equation for free shear flows using an alternating-direction implicit method

NASA Technical Reports Server (NTRS)

Rudy, D. H.; Morris, D. J.

1976-01-01

An uncoupled time asymptotic alternating direction implicit method for solving the Navier-Stokes equations was tested on two laminar parallel mixing flows. A constant total temperature was assumed in order to eliminate the need to solve the full energy equation; consequently, static temperature was evaluated by using algebraic relationship. For the mixing of two supersonic streams at a Reynolds number of 1,000, convergent solutions were obtained for a time step 5 times the maximum allowable size for an explicit method. The solution diverged for a time step 10 times the explicit limit. Improved convergence was obtained when upwind differencing was used for convective terms. Larger time steps were not possible with either upwind differencing or the diagonally dominant scheme. Artificial viscosity was added to the continuity equation in order to eliminate divergence for the mixing of a subsonic stream with a supersonic stream at a Reynolds number of 1,000.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
MOOSE: A PARALLEL COMPUTATIONAL FRAMEWORK FOR COUPLED SYSTEMS OF NONLINEAR EQUATIONS.

DOE Office of Scientific and Technical Information (OSTI.GOV)

G. Hansen; C. Newman; D. Gaston

Systems of coupled, nonlinear partial di?erential equations often arise in sim- ulation of nuclear processes. MOOSE: Multiphysics Ob ject Oriented Simulation Environment, a parallel computational framework targeted at solving these systems is presented. As opposed to traditional data / ?ow oriented com- putational frameworks, MOOSE is instead founded on mathematics based on Jacobian-free Newton Krylov (JFNK). Utilizing the mathematical structure present in JFNK, physics are modularized into “Kernels” allowing for rapid production of new simulation tools. In addition, systems are solved fully cou- pled and fully implicit employing physics based preconditioning allowing for a large amount of ?exibility even withmore » large variance in time scales. Background on the mathematics, an inspection of the structure of MOOSE and several rep- resentative solutions from applications built on the framework are presented.« less
MOOSE: A parallel computational framework for coupled systems of nonlinear equations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Derek Gaston; Chris Newman; Glen Hansen

Systems of coupled, nonlinear partial differential equations (PDEs) often arise in simulation of nuclear processes. MOOSE: Multiphysics Object Oriented Simulation Environment, a parallel computational framework targeted at the solution of such systems, is presented. As opposed to traditional data-flow oriented computational frameworks, MOOSE is instead founded on the mathematical principle of Jacobian-free Newton-Krylov (JFNK) solution methods. Utilizing the mathematical structure present in JFNK, physics expressions are modularized into `Kernels,'' allowing for rapid production of new simulation tools. In addition, systems are solved implicitly and fully coupled, employing physics based preconditioning, which provides great flexibility even with large variance in timemore » scales. A summary of the mathematics, an overview of the structure of MOOSE, and several representative solutions from applications built on the framework are presented.« less
Effects of Individualized Word Retrieval in Kindergarten Vocabulary Intervention

ERIC Educational Resources Information Center

Damhuis, Carmen M. P.; Segers, Eliane; Scheltinga, Femke; Verhoeven, Ludo

2016-01-01

We examined the effects of adaptive word retrieval intervention on a classroom vocabulary program on children's vocabulary acquisition in kindergarten. In the experimental condition, word retrieval was provided in a classroom vocabulary program, combining implicit and explicit vocabulary instructions. Children performed extra word retrieval…
An Analysis of the Selection and Distribution of Knowledge in Massachusetts Music Teacher Preparation Programs: The Song Remains the Same

ERIC Educational Resources Information Center

Borek, Matthew

2012-01-01

Music teachers occupy a conflicted and contested position in many secondary schools, and music teacher education programs have been given the task of preparing students to enter this challenging environment. This qualitative dissertation study examined the explicit, implicit, and null curricula of music teacher preparation programs in…
Large scale database scrubbing using object oriented software components.

PubMed

Herting, R L; Barnes, M R

1998-01-01

Now that case managers, quality improvement teams, and researchers use medical databases extensively, the ability to share and disseminate such databases while maintaining patient confidentiality is paramount. A process called scrubbing addresses this problem by removing personally identifying information while keeping the integrity of the medical information intact. Scrubbing entire databases, containing multiple tables, requires that the implicit relationships between data elements in different tables of the database be maintained. To address this issue we developed DBScrub, a Java program that interfaces with any JDBC compliant database and scrubs the database while maintaining the implicit relationships within it. DBScrub uses a small number of highly configurable object-oriented software components to carry out the scrubbing. We describe the structure of these software components and how they maintain the implicit relationships within the database.
Infusing pleasure: Mood effects of the consumption of a single cup of tea.

PubMed

Einöther, Suzanne J L; Rowson, Matthew; Ramaekers, Johannes G; Giesbrecht, Timo

2016-08-01

Tea has historically been associated with mood benefits. Nevertheless, few studies have empirically investigated mood changes after tea consumption. We explored immediate effects of a single cup of tea up to an hour post-consumption on self-reported valence, arousal, discrete emotions, and implicit measures of mood. In a parallel group design, 153 participants received a cup of tea or placebo tea, or a glass of water. Immediately (i.e. 5 min) after consumption, tea increased valence but reduced arousal, as compared to the placebo. There were no differences at later time points. Discrete emotions did not differ significantly between conditions, immediately or over time. Water consumption increased implicit positivity as compared to placebo. Finally, consumption of tea and water resulted in higher interest in activities overall and in specific activity types compared to placebo. The present study shows that effects of a single cup of tea may be limited to an immediate increase in pleasure and decrease in arousal, which can increase interest in activities. Differences between tea and water were not significant, while differences between water and placebo on implicit measures were unexpected. More servings over a longer time may be required to evoke tea's arousing effects and appropriate tea consumption settings may evoke more enduring valence effects. Copyright © 2016 Elsevier Ltd. All rights reserved.
An interactive parallel programming environment applied in atmospheric science

NASA Technical Reports Server (NTRS)

vonLaszewski, G.

1996-01-01

This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
15 CFR 930.116 - Judicial review.

Code of Federal Regulations, 2010 CFR

2010-01-01

... FEDERAL CONSISTENCY WITH APPROVED COASTAL MANAGEMENT PROGRAMS Secretarial Mediation § 930.116 Judicial... implicitly to limit the parties' use of alternate forums to resolve disputes. Specifically, judicial review...
Multitasking the INS3D-LU code on the Cray Y-MP

NASA Technical Reports Server (NTRS)

Fatoohi, Rod; Yoon, Seokkwan

1991-01-01

This paper presents the results of multitasking the INS3D-LU code on eight processors. The code is a full Navier-Stokes solver for incompressible fluid in three dimensional generalized coordinates using a lower-upper symmetric-Gauss-Seidel implicit scheme. This code has been fully vectorized on oblique planes of sweep and parallelized using autotasking with some directives and minor modifications. The timing results for five grid sizes are presented and analyzed. The code has achieved a processing rate of over one Gflops.
Adagio 4.20 User’s Guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Spencer, Benjamin Whiting; Crane, Nathan K.; Heinstein, Martin W.

2011-03-01

Adagio is a Lagrangian, three-dimensional, implicit code for the analysis of solids and structures. It uses a multi-level iterative solver, which enables it to solve problems with large deformations, nonlinear material behavior, and contact. It also has a versatile library of continuum and structural elements, and an extensive library of material models. Adagio is written for parallel computing environments, and its solvers allow for scalable solutions of very large problems. Adagio uses the SIERRA Framework, which allows for coupling with other SIERRA mechanics codes. This document describes the functionality and input structure for Adagio.

Parallelizing a peanut butter sandwich

NASA Astrophysics Data System (ADS)

Quenette, S. M.

2005-12-01

This poster aims to demonstrate, in a novel way, why contemporary computational code development is seemingly hard to a geodynamics modeler (i.e. a non-computer-scientist). For example, to utilise comtemporary computer hardware, parallelisation is required. But why do we chose the explicit approach (MPI) over an implicit (OpenMP) one? How does this relate to the typical geodynamics codes. And do we face this same style of problems in every day life? We aim to demonstrate that the little bit of complexity, fore-thought and effort is worth its while.
Calibrationless parallel magnetic resonance imaging: a joint sparsity model.

PubMed

Majumdar, Angshul; Chaudhury, Kunal Narayan; Ward, Rabab

2013-12-05

State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation) stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than) state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets-eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used-Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods-CS SENSE and l1SPIRiT and two calibration free techniques-Distributed CS and SAKE. Our method yields better reconstruction results than all of them.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Implicit Solvation Parameters Derived from Explicit Water Forces in Large-Scale Molecular Dynamics Simulations

PubMed Central

2012-01-01

Implicit solvation is a mean force approach to model solvent forces acting on a solute molecule. It is frequently used in molecular simulations to reduce the computational cost of solvent treatment. In the first instance, the free energy of solvation and the associated solvent–solute forces can be approximated by a function of the solvent-accessible surface area (SASA) of the solute and differentiated by an atom–specific solvation parameter σiSASA. A procedure for the determination of values for the σiSASA parameters through matching of explicit and implicit solvation forces is proposed. Using the results of Molecular Dynamics simulations of 188 topologically diverse protein structures in water and in implicit solvent, values for the σiSASA parameters for atom types i of the standard amino acids in the GROMOS force field have been determined. A simplified representation based on groups of atom types σgSASA was obtained via partitioning of the atom–type σiSASA distributions by dynamic programming. Three groups of atom types with well separated parameter ranges were obtained, and their performance in implicit versus explicit simulations was assessed. The solvent forces are available at http://mathbio.nimr.mrc.ac.uk/wiki/Solvent_Forces. PMID:23180979
Multidimensional, fully implicit, exactly conserving electromagnetic particle-in-cell simulations

NASA Astrophysics Data System (ADS)

Chacon, Luis

2015-09-01

We discuss a new, conservative, fully implicit 2D-3V particle-in-cell algorithm for non-radiative, electromagnetic kinetic plasma simulations, based on the Vlasov-Darwin model. Unlike earlier linearly implicit PIC schemes and standard explicit PIC schemes, fully implicit PIC algorithms are unconditionally stable and allow exact discrete energy and charge conservation. This has been demonstrated in 1D electrostatic and electromagnetic contexts. In this study, we build on these recent algorithms to develop an implicit, orbit-averaged, time-space-centered finite difference scheme for the Darwin field and particle orbit equations for multiple species in multiple dimensions. The Vlasov-Darwin model is very attractive for PIC simulations because it avoids radiative noise issues in non-radiative electromagnetic regimes. The algorithm conserves global energy, local charge, and particle canonical-momentum exactly, even with grid packing. The nonlinear iteration is effectively accelerated with a fluid preconditioner, which allows efficient use of large timesteps, O(√{mi/me}c/veT) larger than the explicit CFL. In this presentation, we will introduce the main algorithmic components of the approach, and demonstrate the accuracy and efficiency properties of the algorithm with various numerical experiments in 1D and 2D. Support from the LANL LDRD program and the DOE-SC ASCR office.
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Home-based bimanual training based on motor learning principles in children with unilateral cerebral palsy and their parents (the COAD-study): rationale and protocols.

PubMed

Schnackers, Marlous; Beckers, Laura; Janssen-Potten, Yvonne; Aarts, Pauline; Rameckers, Eugène; van der Burg, Jan; de Groot, Imelda; Smeets, Rob; Geurts, Sander; Steenbergen, Bert

2018-04-18

Home-based training is considered an important intervention in rehabilitation of children with unilateral cerebral palsy. Despite consensus on the value of home-based upper limb training, no evidence-based best practice exists. Promoting compliance of children to adhere to an intensive program while keeping parental stress levels low is an important challenge when designing home-based training programs. Incorporating implicit motor learning principles emerges to be a promising method to resolve this challenge. Here we describe two protocols for home-based bimanual training programs, one based on implicit motor learning principles and one based on explicit motor learning principles, for children with unilateral spastic cerebral palsy aged 2 through 7 years. Children receive goal-oriented, task-specific bimanual training in their home environment from their parents for 3.5 h/week for 12 weeks according to an individualized program. Parents will be intensively coached by a multidisciplinary team, consisting of a pediatric therapist and remedial educationalist. Both programs consist of a preparation phase (goal setting, introductory meetings with coaching professionals, design of individualized program, instruction of parents, home visit) and home-based training phase (training, video-recordings, registrations, and telecoaching and home visits by the coaching team). The programs contrast with respect to the teaching strategy, i.e. how the parents support their child during training. In both programs parents provide their child with instructions and feedback that focus on the activity (i.e. task-oriented) or the result of the activity (i.e. result-oriented). However, in the explicit program parents are in addition instructed to give exact instructions and feedback on the motor performance of the bimanual activities, whereas in the implicit program the use of both hands and the appropriate motor performance of the activity are elicited via manipulation of the organization of the activities. With the protocols described here, we aim to take a next step in the development of much needed evidence-based home-based training programs for children with unilateral cerebral palsy.
Free energy landscape of protein folding in water: explicit vs. implicit solvent.

PubMed

Zhou, Ruhong

2003-11-01

The Generalized Born (GB) continuum solvent model is arguably the most widely used implicit solvent model in protein folding and protein structure prediction simulations; however, it still remains an open question on how well the model behaves in these large-scale simulations. The current study uses the beta-hairpin from C-terminus of protein G as an example to explore the folding free energy landscape with various GB models, and the results are compared to the explicit solvent simulations and experiments. All free energy landscapes are obtained from extensive conformation space sampling with a highly parallel replica exchange method. Because solvation model parameters are strongly coupled with force fields, five different force field/solvation model combinations are examined and compared in this study, namely the explicit solvent model: OPLSAA/SPC model, and the implicit solvent models: OPLSAA/SGB (Surface GB), AMBER94/GBSA (GB with Solvent Accessible Surface Area), AMBER96/GBSA, and AMBER99/GBSA. Surprisingly, we find that the free energy landscapes from implicit solvent models are quite different from that of the explicit solvent model. Except for AMBER96/GBSA, all other implicit solvent models find the lowest free energy state not the native state. All implicit solvent models show erroneous salt-bridge effects between charged residues, particularly in OPLSAA/SGB model, where the overly strong salt-bridge effect results in an overweighting of a non-native structure with one hydrophobic residue F52 expelled from the hydrophobic core in order to make better salt bridges. On the other hand, both AMBER94/GBSA and AMBER99/GBSA models turn the beta-hairpin in to an alpha-helix, and the alpha-helical content is much higher than the previously reported alpha-helices in an explicit solvent simulation with AMBER94 (AMBER94/TIP3P). Only AMBER96/GBSA shows a reasonable free energy landscape with the lowest free energy structure the native one despite an erroneous salt-bridge between D47 and K50. Detailed results on free energy contour maps, lowest free energy structures, distribution of native contacts, alpha-helical content during the folding process, NOE comparison with NMR, and temperature dependences are reported and discussed for all five models. Copyright 2003 Wiley-Liss, Inc.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Explicit and implicit processes in behavioural adaptation to road width.

PubMed

Lewis-Evans, Ben; Charlton, Samuel G

2006-05-01

The finding that drivers may react to safety interventions in a way that is contrary to what was intended is the phenomenon of behavioural adaptation. This phenomenon has been demonstrated across various safety interventions and has serious implications for road safety programs the world over. The present research used a driving simulator to assess behavioural adaptation in drivers' speed and lateral displacement in response to manipulations of road width. Of interest was whether behavioural adaptation would occur and whether we could determine whether it was the result of explicit, conscious decisions or implicit perceptual processes. The results supported an implicit, zero perceived risk model of behavioural adaptation with reduced speeds on a narrowed road accompanied by increased ratings of risk and a marked inability of the participants to identify that any change in road width had occurred.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Pedagogical perspectives and implicit theories of teaching: First year science teachers emerging from a constructivist science education program

NASA Astrophysics Data System (ADS)

Dias, Michael James

Traditional, teacher-centered pedagogies dominate current teaching practice in science education despite numerous research-based assertions that promote more progressive, student-centered teaching methods. Best-practice research emerging from science education reform efforts promotes experiential, collaborative learning environments in line with the constructivist referent. Thus there is a need to identify specific teacher education program designs that will promote the utilization of constructivist theory among new teachers. This study explored the learning-to-teach process of four first-year high school teachers, all graduates of a constructivist-based science education program known as Teacher Education Environments in Mathematics and Science (TEEMS). Pedagogical perspectives and implicit theories were explored to identify common themes and their relation to the pre-service program and the teaching context. Qualitative methods were employed to gather and analyze the data. In depth, semi-structured interviews (Seidman, 1998) formed the primary data for probing the context and details of the teachers' experience as well as the personal meaning derived from first year practice. Teacher journals and teaching artifacts were utilized to validate and challenge the primary data. Through an open-coding technique (Strauss & Corbin, 1990) codes, and themes were generated from which assertions were made. The pedagogical perspectives apparent among the participants in this study emerged as six patterns in teaching method: (1) utilization of grouping strategies, (2) utilization of techniques that allow the students to help teach, (3) similar format of daily instructional strategy, (4) utilization of techniques intended to promote engagement, (5) utilization of review strategies, (6) assessment by daily monitoring and traditional tests, (7) restructuring content knowledge. Assertions from implicit theory data include: (1) Time constraints and lack of teaching experience made inquiry teaching difficult to implement for the first year teachers in this study. (2) Commitment to teaching and supportive relationships at the school helped the first year teachers negotiate a satisfying role. (3) A congruence existed between the first-year teachers' implicit theories and the social/experiential design of TEEMS. This congruence represented a narrowing of the gap between educational theory and practice. Implications for science-teacher education highlight the potential for experiential program designs to narrow the gap between educational theory and practice.

Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
Anatomy structure creation and editing using 3D implicit surfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hibbard, Lyndon S.

2012-05-15

Purpose: To accurately reconstruct, and interactively reshape 3D anatomy structures' surfaces using small numbers of 2D contours drawn in the most visually informative views of 3D imagery. The innovation of this program is that the number of 2D contours can be very much smaller than the number of transverse sections, even for anatomy structures spanning many sections. This program can edit 3D structures from prior segmentations, including those from autosegmentation programs. The reconstruction and surface editing works with any image modality. Methods: Structures are represented by variational implicit surfaces defined by weighted sums of radial basis functions (RBFs). Such surfacesmore » are smooth, continuous, and closed and can be reconstructed with RBFs optimally located to efficiently capture shape in any combination of transverse (T), sagittal (S), and coronal (C) views. The accuracy of implicit surface reconstructions was measured by comparisons with the corresponding expert-contoured surfaces in 103 prostate cancer radiotherapy plans. Editing a pre-existing surface is done by overdrawing its profiles in image views spanning the affected part of the structure, deleting an appropriate set of prior RBFs, and merging the remainder with the new edit contour RBFs. Two methods were devised to identify RBFs to be deleted based only on the geometry of the initial surface and the locations of the new RBFs. Results: Expert-contoured surfaces were compared with implicit surfaces reconstructed from them over varying numbers and combinations of T/S/C planes. Studies revealed that surface-surface agreement increases monotonically with increasing RBF-sample density, and that the rate of increase declines over the same range. These trends were observed for all surface agreement metrics and for all the organs studied--prostate, bladder, and rectum. In addition, S and C contours may convey more shape information than T views for CT studies in which the axial slice thickness is greater than the pixel size. Surface editing accuracy likewise improves with larger sampling densities, and the rate of improvement similarly declines over the same conditions. Conclusions: Implicit surfaces based on RBFs are accurate representations of anatomic structures and can be interactively generated or modified to correct segmentation errors. The number of input contours is typically smaller than the number of T contours spanned by the structure.« less
Multi-scale simulations of space problems with iPIC3D

NASA Astrophysics Data System (ADS)

Lapenta, Giovanni; Bettarini, Lapo; Markidis, Stefano

The implicit Particle-in-Cell method for the computer simulation of space plasma, and its im-plementation in a three-dimensional parallel code, called iPIC3D, are presented. The implicit integration in time of the Vlasov-Maxwell system removes the numerical stability constraints and enables kinetic plasma simulations at magnetohydrodynamics scales. Simulations of mag-netic reconnection in plasma are presented to show the effectiveness of the algorithm. In particular we will show a number of simulations done for large scale 3D systems using the physical mass ratio for Hydrogen. Most notably one simulation treats kinetically a box of tens of Earth radii in each direction and was conducted using about 16000 processors of the Pleiades NASA computer. The work is conducted in collaboration with the MMS-IDS theory team from University of Colorado (M. Goldman, D. Newman and L. Andersson). Reference: Stefano Markidis, Giovanni Lapenta, Rizwan-uddin Multi-scale simulations of plasma with iPIC3D Mathematics and Computers in Simulation, Available online 17 October 2009, http://dx.doi.org/10.1016/j.matcom.2009.08.038
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; Stanier, Adam John

Here, we demonstrate a scalable fully implicit algorithm for the two-field low-β extended MHD model. This reduced model describes plasma behavior in the presence of strong guide fields, and is of significant practical impact both in nature and in laboratory plasmas. The model displays strong hyperbolic behavior, as manifested by the presence of fast dispersive waves, which make a fully implicit treatment very challenging. In this study, we employ a Jacobian-free Newton–Krylov nonlinear solver, for which we propose a physics-based preconditioner that renders the linearized set of equations suitable for inversion with multigrid methods. As a result, the algorithm ismore » shown to scale both algorithmically (i.e., the iteration count is insensitive to grid refinement and timestep size) and in parallel in a weak-scaling sense, with the wall-clock time scaling weakly with the number of cores for up to 4096 cores. For a 4096 × 4096 mesh, we demonstrate a wall-clock-time speedup of ~6700 with respect to explicit algorithms. The model is validated linearly (against linear theory predictions) and nonlinearly (against fully kinetic simulations), demonstrating excellent agreement.« less
Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media

NASA Astrophysics Data System (ADS)

Yang, Haijian; Sun, Shuyu; Yang, Chao

2017-03-01

Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
On the Helix Propensity in Generalized Born Solvent Descriptions of Modeling the Dark Proteome

PubMed Central

Olson, Mark A.

2017-01-01

Intrinsically disordered proteins that populate the so-called “Dark Proteome” offer challenging benchmarks of atomistic simulation methods to accurately model conformational transitions on a multidimensional energy landscape. This work explores the application of parallel tempering with implicit solvent models as a computational framework to capture the conformational ensemble of an intrinsically disordered peptide derived from the Ebola virus protein VP35. A recent X-ray crystallographic study reported a protein-peptide interface where the VP35 peptide underwent a folding transition from a disordered form to a helix-β-turn-helix topological fold upon molecular association with the Ebola protein NP. An assessment is provided of the accuracy of two generalized Born solvent models (GBMV2 and GBSW2) using the CHARMM force field and applied with temperature-based replica exchange dynamics to calculate the disorder propensity of the peptide and its probability density of states in a continuum solvent. A further comparison is presented of applying an explicit/implicit solvent hybrid replica exchange simulation of the peptide to determine the effect of modeling water interactions at the all-atom resolution. PMID:28197405
On the Helix Propensity in Generalized Born Solvent Descriptions of Modeling the Dark Proteome.

PubMed

Olson, Mark A

2017-01-01

Intrinsically disordered proteins that populate the so-called "Dark Proteome" offer challenging benchmarks of atomistic simulation methods to accurately model conformational transitions on a multidimensional energy landscape. This work explores the application of parallel tempering with implicit solvent models as a computational framework to capture the conformational ensemble of an intrinsically disordered peptide derived from the Ebola virus protein VP35. A recent X-ray crystallographic study reported a protein-peptide interface where the VP35 peptide underwent a folding transition from a disordered form to a helix-β-turn-helix topological fold upon molecular association with the Ebola protein NP. An assessment is provided of the accuracy of two generalized Born solvent models (GBMV2 and GBSW2) using the CHARMM force field and applied with temperature-based replica exchange dynamics to calculate the disorder propensity of the peptide and its probability density of states in a continuum solvent. A further comparison is presented of applying an explicit/implicit solvent hybrid replica exchange simulation of the peptide to determine the effect of modeling water interactions at the all-atom resolution.
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Fast viscosity solutions for shape from shading under a more realistic imaging model

NASA Astrophysics Data System (ADS)

Wang, Guohui; Han, Jiuqiang; Jia, Honghai; Zhang, Xinman

2009-11-01

Shape from shading (SFS) has been a classical and important problem in the domain of computer vision. The goal of SFS is to reconstruct the 3-D shape of an object from its 2-D intensity image. To this end, an image irradiance equation describing the relation between the shape of a surface and its corresponding brightness variations is used. Then it is derived as an explicit partial differential equation (PDE). Using the nonlinear programming principle, we propose a detailed solution to Prados and Faugeras's implicit scheme for approximating the viscosity solution of the resulting PDE. Furthermore, by combining implicit and semi-implicit schemes, a new approximation scheme is presented. In order to accelerate the convergence speed, we adopt the Gauss-Seidel idea and alternating sweeping strategy to the approximation schemes. Experimental results on both synthetic and real images are performed to demonstrate that the proposed methods are fast and accurate.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
On multigrid solution of the implicit equations of hydrodynamics. Experiments for the compressible Euler equations in general coordinates

NASA Astrophysics Data System (ADS)

Kifonidis, K.; Müller, E.

2012-08-01

Aims: We describe and study a family of new multigrid iterative solvers for the multidimensional, implicitly discretized equations of hydrodynamics. Schemes of this class are free of the Courant-Friedrichs-Lewy condition. They are intended for simulations in which widely differing wave propagation timescales are present. A preferred solver in this class is identified. Applications to some simple stiff test problems that are governed by the compressible Euler equations, are presented to evaluate the convergence behavior, and the stability properties of this solver. Algorithmic areas are determined where further work is required to make the method sufficiently efficient and robust for future application to difficult astrophysical flow problems. Methods: The basic equations are formulated and discretized on non-orthogonal, structured curvilinear meshes. Roe's approximate Riemann solver and a second-order accurate reconstruction scheme are used for spatial discretization. Implicit Runge-Kutta (ESDIRK) schemes are employed for temporal discretization. The resulting discrete equations are solved with a full-coarsening, non-linear multigrid method. Smoothing is performed with multistage-implicit smoothers. These are applied here to the time-dependent equations by means of dual time stepping. Results: For steady-state problems, our results show that the efficiency of the present approach is comparable to the best implicit solvers for conservative discretizations of the compressible Euler equations that can be found in the literature. The use of red-black as opposed to symmetric Gauss-Seidel iteration in the multistage-smoother is found to have only a minor impact on multigrid convergence. This should enable scalable parallelization without having to seriously compromise the method's algorithmic efficiency. For time-dependent test problems, our results reveal that the multigrid convergence rate degrades with increasing Courant numbers (i.e. time step sizes). Beyond a Courant number of nine thousand, even complete multigrid breakdown is observed. Local Fourier analysis indicates that the degradation of the convergence rate is associated with the coarse-grid correction algorithm. An implicit scheme for the Euler equations that makes use of the present method was, nevertheless, able to outperform a standard explicit scheme on a time-dependent problem with a Courant number of order 1000. Conclusions: For steady-state problems, the described approach enables the construction of parallelizable, efficient, and robust implicit hydrodynamics solvers. The applicability of the method to time-dependent problems is presently restricted to cases with moderately high Courant numbers. This is due to an insufficient coarse-grid correction of the employed multigrid algorithm for large time steps. Further research will be required to help us to understand and overcome the observed multigrid convergence difficulties for time-dependent problems.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
A Fourier collocation time domain method for numerically solving Maxwell's equations

NASA Technical Reports Server (NTRS)

Shebalin, John V.

1991-01-01

A new method for solving Maxwell's equations in the time domain for arbitrary values of permittivity, conductivity, and permeability is presented. Spatial derivatives are found by a Fourier transform method and time integration is performed using a second order, semi-implicit procedure. Electric and magnetic fields are collocated on the same grid points, rather than on interleaved points, as in the Finite Difference Time Domain (FDTD) method. Numerical results are presented for the propagation of a 2-D Transverse Electromagnetic (TEM) mode out of a parallel plate waveguide and into a dielectric and conducting medium.
On scheduling task systems with variable service times

NASA Astrophysics Data System (ADS)

Maset, Richard G.; Banawan, Sayed A.

1993-08-01

Several strategies have been proposed for developing optimal and near-optimal schedules for task systems (jobs consisting of multiple tasks that can be executed in parallel). Most such strategies, however, implicitly assume deterministic task service times. We show that these strategies are much less effective when service times are highly variable. We then evaluate two strategies—one adaptive, one static—that have been proposed for retaining high performance despite such variability. Both strategies are extensions of critical path scheduling, which has been found to be efficient at producing near-optimal schedules. We found the adaptive approach to be quite effective.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydn; Pothen, Alex

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE PAGES

Azad, Ariful; Buluc, Aydn; Pothen, Alex

2016-03-24

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less

Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
SIERRA Multimechanics Module: Aria User Manual Version 4.44

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

2017-04-01

Aria is a Galerkin fnite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process fows via the incompressible Navier-Stokes equations specialized to a low Reynolds number ( %3C 1 ) regime. Enhanced modeling support of manufacturing processing is made possible through use of eithermore » arbitrary Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h -adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

Aria is a Galerkin fnite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process fows via the incompressible Navier-Stokes equations specialized to a low Reynolds number ( %3C 1 ) regime. Enhanced modeling support of manufacturing processing is made possible through use of eithermore » arbitrary Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h -adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

Aria is a Galerkin finite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process flows via the incompressible Navier-Stokes equations specialized to a low Reynolds number (Re %3C 1) regime. Enhanced modeling support of manufacturing processing is made possible through use of either arbitrarymore » Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h-adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
Effects of Biggest Loser exercise depictions on exercise-related attitudes.

PubMed

Berry, Tanya R; McLeod, Nicole C; Pankratow, Melanie; Walker, Jessica

2013-01-01

To examine whether participants who watched an exercise-related segment of The Biggest Loser television program would have different explicit and implicit affective exercise-related attitudes than those of control participants. University students (N=138) watched a clip of The Biggest Loser or American Idol, then completed a Go/No-go Association Task, a thought-listing task, and questionnaires measuring explicit attitudes, activity level, and mood. Participants who watched The Biggest Loser had significantly lower explicit, but not implicit, attitudes towards exercise than did control participants. There is a need to examine the influence of popular media depictions of exercise.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

GroPBS: Fast Solver for Implicit Electrostatics of Biomolecules

PubMed Central

Bertelshofer, Franziska; Sun, Liping; Greiner, Günther; Böckmann, Rainer A.

2015-01-01

Knowledge about the electrostatic potential on the surface of biomolecules or biomembranes under physiological conditions is an important step in the attempt to characterize the physico-chemical properties of these molecules and, in particular, also their interactions with each other. Additionally, knowledge about solution electrostatics may also guide the design of molecules with specified properties. However, explicit water models come at a high computational cost, rendering them unsuitable for large design studies or for docking purposes. Implicit models with the water phase treated as a continuum require the numerical solution of the Poisson–Boltzmann equation (PBE). Here, we present a new flexible program for the numerical solution of the PBE, allowing for different geometries, and the explicit and implicit inclusion of membranes. It involves a discretization of space and the computation of the molecular surface. The PBE is solved using finite differences, the resulting set of equations is solved using a Gauss–Seidel method. It is shown for the example of the sucrose transporter ScrY that the implicit inclusion of a surrounding membrane has a strong effect also on the electrostatics within the pore region and, thus, needs to be carefully considered, e.g., in design studies on membrane proteins. PMID:26636074
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
GPU-based acceleration of computations in nonlinear finite element deformation analysis.

PubMed

Mafi, Ramin; Sirouspour, Shahin

2014-03-01

The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
A parallelization method for time periodic steady state in simulation of radio frequency sheath dynamics

NASA Astrophysics Data System (ADS)

Kwon, Deuk-Chul; Shin, Sung-Sik; Yu, Dong-Hun

2017-10-01

In order to reduce the computing time in simulation of radio frequency (rf) plasma sources, various numerical schemes were developed. It is well known that the upwind, exponential, and power-law schemes can efficiently overcome the limitation on the grid size for fluid transport simulations of high density plasma discharges. Also, the semi-implicit method is a well-known numerical scheme to overcome on the simulation time step. However, despite remarkable advances in numerical techniques and computing power over the last few decades, efficient multi-dimensional modeling of low temperature plasma discharges has remained a considerable challenge. In particular, there was a difficulty on parallelization in time for the time periodic steady state problems such as capacitively coupled plasma discharges and rf sheath dynamics because values of plasma parameters in previous time step are used to calculate new values each time step. Therefore, we present a parallelization method for the time periodic steady state problems by using period-slices. In order to evaluate the efficiency of the developed method, one-dimensional fluid simulations are conducted for describing rf sheath dynamics. The result shows that speedup can be achieved by using a multithreading method.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.

Apollo lunar descent guidance

NASA Technical Reports Server (NTRS)

Klumpp, A. R.

1974-01-01

Apollo lunar-descent guidance transfers the Lunar Module from a near-circular orbit to touchdown, traversing a 17 deg central angle and a 15 km altitude in 11 min. A group of interactive programs in an onboard computer guide the descent, controlling altitude and the descent propulsion system throttle. A ground-based program pre-computes guidance targets. The concepts involved in this guidance are described. Explicit and implicit guidance are discussed, guidance equations are derived, and the earlier Apollo explicit equation is shown to be an inferior special case of the later implicit equation. Interactive guidance, by which the two-man crew selects a landing site in favorable terrain and directs the trajectory there, is discussed. Interactive terminal-descent guidance enables the crew to control the essentially vertical descent rate in order to land in minimum time with safe contact speed. The altitude maneuver routine uses concepts that make gimbal lock inherently impossible.
The Numerical Technique for the Landslide Tsunami Simulations Based on Navier-Stokes Equations

NASA Astrophysics Data System (ADS)

Kozelkov, A. S.

2017-12-01

The paper presents an integral technique simulating all phases of a landslide-driven tsunami. The technique is based on the numerical solution of the system of Navier-Stokes equations for multiphase flows. The numerical algorithm uses a fully implicit approximation method, in which the equations of continuity and momentum conservation are coupled through implicit summands of pressure gradient and mass flow. The method we propose removes severe restrictions on the time step and allows simulation of tsunami propagation to arbitrarily large distances. The landslide origin is simulated as an individual phase being a Newtonian fluid with its own density and viscosity and separated from the water and air phases by an interface. The basic formulas of equation discretization and expressions for coefficients are presented, and the main steps of the computation procedure are described in the paper. To enable simulations of tsunami propagation across wide water areas, we propose a parallel algorithm of the technique implementation, which employs an algebraic multigrid method. The implementation of the multigrid method is based on the global level and cascade collection algorithms that impose no limitations on the paralleling scale and make this technique applicable to petascale systems. We demonstrate the possibility of simulating all phases of a landslide-driven tsunami, including its generation, propagation and uprush. The technique has been verified against the problems supported by experimental data. The paper describes the mechanism of incorporating bathymetric data to simulate tsunamis in real water areas of the world ocean. Results of comparison with the nonlinear dispersion theory, which has demonstrated good agreement, are presented for the case of a historical tsunami of volcanic origin on the Montserrat Island in the Caribbean Sea.
Correlates of Physician Retention at Tripler Army Medical Center

DTIC Science & Technology

1991-12-01

benefits each program produces, the explicit and implicit values of those benefits, and the program’s direct and indirect costs. For the most part, there...data is also remarkably accurate (within sampling error). A survey of a particular group can give a very accurate picture of the groups values , beliefs...opportunity to teach and administer training programs. - Military medicine is exciting, challenging, and varied. Because of the value of written comments
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Using a Delphi Technique to Seek Consensus Regarding Definitions, Descriptions and Classification of Terms Related to Implicit and Explicit Forms of Motor Learning

PubMed Central

Kleynen, Melanie; Braun, Susy M.; Bleijlevens, Michel H.; Lexis, Monique A.; Rasquin, Sascha M.; Halfens, Jos; Wilson, Mark R.; Beurskens, Anna J.; Masters, Rich S. W.

2014-01-01

Background Motor learning is central to domains such as sports and rehabilitation; however, often terminologies are insufficiently uniform to allow effective sharing of experience or translation of knowledge. A study using a Delphi technique was conducted to ascertain level of agreement between experts from different motor learning domains (i.e., therapists, coaches, researchers) with respect to definitions and descriptions of a fundamental conceptual distinction within motor learning, namely implicit and explicit motor learning. Methods A Delphi technique was embedded in multiple rounds of a survey designed to collect and aggregate informed opinions of 49 international respondents with expertise related to motor learning. The survey was administered via an online survey program and accompanied by feedback after each round. Consensus was considered to be reached if ≥70% of the experts agreed on a topic. Results Consensus was reached with respect to definitions of implicit and explicit motor learning, and seven common primary intervention strategies were identified in the context of implicit and explicit motor learning. Consensus was not reached with respect to whether the strategies promote implicit or explicit forms of learning. Discussion The definitions and descriptions agreed upon may aid translation and transfer of knowledge between domains in the field of motor learning. Empirical and clinical research is required to confirm the accuracy of the definitions and to explore the feasibility of the strategies that were identified in research, everyday practice and education. PMID:24968228
Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

NASA Technical Reports Server (NTRS)

Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
Multicultural Education vs. Implicit and Explicit Ethnocentric Education: Text Analysis of a Contemporary Israeli Value Education Program

ERIC Educational Resources Information Center

Reingold, Roni; Zamir, Sara

2017-01-01

In the year 2000, Israel purportedly adopted a multicultural educational policy. It replaced the covert assimilation policy, which was referred to as "the integration policy." The aim of the present study was to analyse the contemporary Israeli program of value education. Using the method of content analysis, the present study sought to…
Feedback data sources that inform physician self-assessment.

PubMed

Lockyer, Jocelyn; Armson, Heather; Chesluk, Benjamin; Dornan, Timothy; Holmboe, Eric; Loney, Elaine; Mann, Karen; Sargeant, Joan

2011-01-01

Self-assessment is a process of interpreting data about one's performance and comparing it to explicit or implicit standards. To examine the external data sources physicians used to monitor themselves. Focus groups were conducted with physicians who participated in three practice improvement activities: a multisource feedback program; a program providing patient and chart audit data; and practice-based learning groups. We used grounded theory strategies to understand the external sources that stimulated self-assessment and how they worked. Data from seven focus groups (49 physicians) were analyzed. Physicians used information from structured programs, other educational activities, professional colleagues, and patients. Data were of varying quality, often from non-formal sources with implicit (not explicit) standards. Mandatory programs elicited variable responses, whereas data and activities the physicians selected themselves were more likely to be accepted. Physicians used the information to create a reference point against which they could weigh their performance using it variably depending on their personal interpretation of its accuracy, application, and utility. Physicians use and interpret data and standards of varying quality to inform self-assessment. Physicians may benefit from regular and routine feedback and guidance on how to seek out data for self-assessment.
Resident perceptions of the impact of duty hour restrictions on resident-attending interactions: an exploratory study.

PubMed

Gerjevic, Kristen A; Rosenbaum, Marcy E; Suneja, Manish

2017-07-18

The institution of duty hour reforms by the Accreditation Council for Graduate Medical Education in 2003 has created a learning environment where residents are consistently looking for input from attending physicians with regards to balancing duty hour regulations and providing quality patient care. There is a paucity of literature regarding resident perceptions of attending physician actions or attitudes towards work hour restrictions. The purpose of this study was to identify attending physician behaviors that residents perceived as supportive or unsupportive of their compliance with duty hour regulations. Focus group interviews were conducted with residents exploring their perceptions of how duty hour regulations impact their interactions with attending physicians. Qualitative analysis identified key themes in residents' experiences interacting with faculty in regard to duty hour regulations. Forty residents from five departments in two hospital systems participated. Discussion of these interactions highlighted that attending physicians demonstrate behaviors that explicitly or implicitly either lend their support and understanding of residents' need to comply with these regulations or imply a lack of support and understanding. Three major themes that contributed to the ease or difficulty in addressing duty hour regulations included attending physicians' explicit communication of expectations, implicit non-verbal and verbal cues and the program's organizational culture. Resident physicians' perception of attending physicians' explicit and implicit communication and residency programs organization culture has an impact on residents' experience with duty hour restrictions. Residency faculty and programs could benefit from explicitly addressing and supporting the challenges that residents perceive in complying with duty hour restrictions.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Rotor cascade shape optimization with unsteady passing wakes using implicit dual time stepping method

NASA Astrophysics Data System (ADS)

Lee, Eun Seok

2000-10-01

An improved aerodynamics performance of a turbine cascade shape can be achieved by an understanding of the flow-field associated with the stator-rotor interaction. In this research, an axial gas turbine airfoil cascade shape is optimized for improved aerodynamic performance by using an unsteady Navier-Stokes solver and a parallel genetic algorithm. The objective of the research is twofold: (1) to develop a computational fluid dynamics code having faster convergence rate and unsteady flow simulation capabilities, and (2) to optimize a turbine airfoil cascade shape with unsteady passing wakes for improved aerodynamic performance. The computer code solves the Reynolds averaged Navier-Stokes equations. It is based on the explicit, finite difference, Runge-Kutta time marching scheme and the Diagonalized Alternating Direction Implicit (DADI) scheme, with the Baldwin-Lomax algebraic and k-epsilon turbulence modeling. Improvements in the code focused on the cascade shape design capability, convergence acceleration and unsteady formulation. First, the inverse shape design method was implemented in the code to provide the design capability, where a surface transpiration concept was employed as an inverse technique to modify the geometry satisfying the user specified pressure distribution on the airfoil surface. Second, an approximation storage multigrid method was implemented as an acceleration technique. Third, the preconditioning method was adopted to speed up the convergence rate in solving the low Mach number flows. Finally, the implicit dual time stepping method was incorporated in order to simulate the unsteady flow-fields. For the unsteady code validation, the Stokes's 2nd problem and the Poiseuille flow were chosen and compared with the computed results and analytic solutions. To test the code's ability to capture the natural unsteady flow phenomena, vortex shedding past a cylinder and the shock oscillation over a bicircular airfoil were simulated and compared with experiments and other research results. The rotor cascade shape optimization with unsteady passing wakes was performed to obtain an improved aerodynamic performance using the unsteady Navier-Stokes solver. Two objective functions were defined as minimization of total pressure loss and maximization of lift, while the mass flow rate was fixed. A parallel genetic algorithm was used as an optimizer and the penalty method was introduced. Each individual's objective function was computed simultaneously by using a 32 processor distributed memory computer. One optimization took about four days.
77 FR 73611 - Notice of Public Information Collection Requirements Submitted to OMB for Review

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-11

... the operation of a program compared to a set of explicit or implicit standards, as a means of..., filing of petitions and applications and agency #0;statements of organization and functions are examples...
Forebody and base region real gas flow in severe planetary entry by a factored implicit numerical method. II - Equilibrium reactive gas

NASA Technical Reports Server (NTRS)

Davy, W. C.; Green, M. J.; Lombard, C. K.

1981-01-01

The factored-implicit, gas-dynamic algorithm has been adapted to the numerical simulation of equilibrium reactive flows. Changes required in the perfect gas version of the algorithm are developed, and the method of coupling gas-dynamic and chemistry variables is discussed. A flow-field solution that approximates a Jovian entry case was obtained by this method and compared with the same solution obtained by HYVIS, a computer program much used for the study of planetary entry. Comparison of surface pressure distribution and stagnation line shock-layer profiles indicates that the two solutions agree well.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
OTIS 3.2 Software Released

NASA Technical Reports Server (NTRS)

Riehl, John P.; Sjauw, Waldy K.

2004-01-01

Trajectory, mission, and vehicle engineers concern themselves with finding the best way for an object to get from one place to another. These engineers rely upon special software to assist them in this. For a number of years, many engineers have used the OTIS program for this assistance. With OTIS, an engineer can fully optimize trajectories for airplanes, launch vehicles like the space shuttle, interplanetary spacecraft, and orbital transfer vehicles. OTIS provides four modes of operation, with each mode providing successively stronger optimization capability. The most powerful mode uses a mathematical method called implicit integration to solve what engineers and mathematicians call the optimal control problem. OTIS 3.2, which was developed at the NASA Glenn Research Center, is the latest release of this industry workhorse and features new capabilities for parameter optimization and mission design. OTIS stands for Optimal Control by Implicit Simulation, and it is implicit integration that makes OTIS so powerful at solving trajectory optimization problems. Why is this so important? The optimization process not only determines how to get from point A to point B, but it can also determine how to do this with the least amount of propellant, with the lightest starting weight, or in the fastest time possible while avoiding certain obstacles along the way. There are numerous conditions that engineers can use to define optimal, or best. OTIS provides a framework for defining the starting and ending points of the trajectory (point A and point B), the constraints on the trajectory (requirements like "avoid these regions where obstacles occur"), and what is being optimized (e.g., minimize propellant). The implicit integration method can find solutions to very complicated problems when there is not a lot of information available about what the optimal trajectory might be. The method was first developed for solving two-point boundary value problems and was adapted for use in OTIS. Implicit integration usually allows OTIS to find solutions to problems much faster than programs that use explicit integration and parametric methods. Consequently, OTIS is best suited to solving very complicated and highly constrained problems.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

PubMed

Stone, John E; Gohara, David; Shi, Guochun

2010-05-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Empathy: Gender effects in brain and behavior

PubMed Central

Christov-Moore, Leonardo; Simpson, Elizabeth A.; Coudé, Gino; Grigaityte, Kristina; Iacoboni, Marco; Ferrari, Pier Francesco

2016-01-01

Evidence suggests that there are differences in the capacity for empathy between males and females. However, how deep do these differences go? Stereotypically, females are portrayed as more nurturing and empathetic, while males are portrayed as less emotional and more cognitive. Some authors suggest that observed gender differences might be largely due to cultural expectations about gender roles. However, empathy has both evolutionary and developmental precursors, and can be studied using implicit measures, aspects that can help elucidate the respective roles of culture and biology. This article reviews evidence from ethology, social psychology, economics, and neuroscience to show that there are fundamental differences in implicit measures of empathy, with parallels in development and evolution. Studies in nonhuman animals and younger human populations (infants/children) offer converging evidence that sex differences in empathy have phylogenetic and ontogenetic roots in biology and are not merely cultural byproducts driven by socialization. We review how these differences may have arisen in response to males’ and females’ different roles throughout evolution. Examinations of the neurobiological underpinnings of empathy reveal important quantitative gender differences in the basic networks involved in affective and cognitive forms of empathy, as well as a qualitative divergence between the sexes in how emotional information is integrated to support decision making processes. Finally, the study of gender differences in empathy can be improved by designing studies with greater statistical power and considering variables implicit in gender (e.g., sexual preference, prenatal hormone exposure). These improvements may also help uncover the nature of neurodevelopmental and psychiatric disorders in which one sex is more vulnerable to compromised social competence associated with impaired empathy. PMID:25236781
GPU-accelerated Lattice Boltzmann method for anatomical extraction in patient-specific computational hemodynamics

NASA Astrophysics Data System (ADS)

Yu, H.; Wang, Z.; Zhang, C.; Chen, N.; Zhao, Y.; Sawchuk, A. P.; Dalsing, M. C.; Teague, S. D.; Cheng, Y.

2014-11-01

Existing research of patient-specific computational hemodynamics (PSCH) heavily relies on software for anatomical extraction of blood arteries. Data reconstruction and mesh generation have to be done using existing commercial software due to the gap between medical image processing and CFD, which increases computation burden and introduces inaccuracy during data transformation thus limits the medical applications of PSCH. We use lattice Boltzmann method (LBM) to solve the level-set equation over an Eulerian distance field and implicitly and dynamically segment the artery surfaces from radiological CT/MRI imaging data. The segments seamlessly feed to the LBM based CFD computation of PSCH thus explicit mesh construction and extra data management are avoided. The LBM is ideally suited for GPU (graphic processing unit)-based parallel computing. The parallel acceleration over GPU achieves excellent performance in PSCH computation. An application study will be presented which segments an aortic artery from a chest CT dataset and models PSCH of the segmented artery.
Parallel Adaptive High-Order CFD Simulations Characterizing Cavity Acoustics for the Complete SOFIA Aircraft

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2014-01-01

This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.

Efficient solution of parabolic equations by Krylov approximation methods

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Y.

1990-01-01

Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
The Chorus Conflict and Loss of Separation Resolution Algorithms

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Hagen, George E.; Maddalon, Jeffrey M.

2013-01-01

The Chorus software is designed to investigate near-term, tactical conflict and loss of separation detection and resolution concepts for air traffic management. This software is currently being used in two different problem domains: en-route self- separation and sense and avoid for unmanned aircraft systems. This paper describes the core resolution algorithms that are part of Chorus. The combination of several features of the Chorus program distinguish this software from other approaches to conflict and loss of separation resolution. First, the program stores a history of state information over time which enables it to handle communication dropouts and take advantage of previous input data. Second, the underlying conflict algorithms find resolutions that solve the most urgent conflict, but also seek to prevent secondary conflicts with the other aircraft. Third, if the program is run on multiple aircraft, and the two aircraft maneuver at the same time, the result will be implicitly co-ordinated. This implicit coordination property is established by ensuring that a resolution produced by Chorus will comply with a mathematically-defined criteria whose correctness has been formally verified. Fourth, the program produces both instantaneous solutions and kinematic solutions, which are based on simple accel- eration models. Finally, the program provides resolutions for recovery from loss of separation. Different versions of this software are implemented as Java and C++ software programs, respectively.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

PubMed Central

Stone, John E.; Gohara, David; Shi, Guochun

2010-01-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.; Stevens, K.

1984-01-01

Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
An Expert System for the Development of Efficient Parallel Code

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
pyJac: Analytical Jacobian generator for chemical kinetics

NASA Astrophysics Data System (ADS)

Niemeyer, Kyle E.; Curtis, Nicholas J.; Sung, Chih-Jen

2017-06-01

Accurate simulations of combustion phenomena require the use of detailed chemical kinetics in order to capture limit phenomena such as ignition and extinction as well as predict pollutant formation. However, the chemical kinetic models for hydrocarbon fuels of practical interest typically have large numbers of species and reactions and exhibit high levels of mathematical stiffness in the governing differential equations, particularly for larger fuel molecules. In order to integrate the stiff equations governing chemical kinetics, generally reactive-flow simulations rely on implicit algorithms that require frequent Jacobian matrix evaluations. Some in situ and a posteriori computational diagnostics methods also require accurate Jacobian matrices, including computational singular perturbation and chemical explosive mode analysis. Typically, finite differences numerically approximate these, but for larger chemical kinetic models this poses significant computational demands since the number of chemical source term evaluations scales with the square of species count. Furthermore, existing analytical Jacobian tools do not optimize evaluations or support emerging SIMD processors such as GPUs. Here we introduce pyJac, a Python-based open-source program that generates analytical Jacobian matrices for use in chemical kinetics modeling and analysis. In addition to producing the necessary customized source code for evaluating reaction rates (including all modern reaction rate formulations), the chemical source terms, and the Jacobian matrix, pyJac uses an optimized evaluation order to minimize computational and memory operations. As a demonstration, we first establish the correctness of the Jacobian matrices for kinetic models of hydrogen, methane, ethylene, and isopentanol oxidation (number of species ranging 13-360) by showing agreement within 0.001% of matrices obtained via automatic differentiation. We then demonstrate the performance achievable on CPUs and GPUs using pyJac via matrix evaluation timing comparisons; the routines produced by pyJac outperformed first-order finite differences by 3-7.5 times and the existing analytical Jacobian software TChem by 1.1-2.2 times on a single-threaded basis. It is noted that TChem is not thread-safe, while pyJac is easily parallelized, and hence can greatly outperform TChem on multicore CPUs. The Jacobian matrix generator we describe here will be useful for reducing the cost of integrating chemical source terms with implicit algorithms in particular and algorithms that require an accurate Jacobian matrix in general. Furthermore, the open-source release of the program and Python-based implementation will enable wide adoption.
Solving Integer Programs from Dependence and Synchronization Problems

DTIC Science & Technology

1993-03-01

DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving
Building Bridges Between Structural and Program Evaluation Approaches to Evaluating Policy

PubMed Central

Heckman, James J.

2011-01-01

This paper compares the structural approach to economic policy analysis with the program evaluation approach. It offers a third way to do policy analysis that combines the best features of both approaches. We illustrate the value of this alternative approach by making the implicit economics of LATE explicit, thereby extending the interpretability and range of policy questions that LATE can answer. PMID:21743749
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE PAGES

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

2018-05-01

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Concurrence of rule- and similarity-based mechanisms in artificial grammar learning.

PubMed

Opitz, Bertram; Hofmann, Juliane

2015-03-01

A current theoretical debate regards whether rule-based or similarity-based learning prevails during artificial grammar learning (AGL). Although the majority of findings are consistent with a similarity-based account of AGL it has been argued that these results were obtained only after limited exposure to study exemplars, and performance on subsequent grammaticality judgment tests has often been barely above chance level. In three experiments the conditions were investigated under which rule- and similarity-based learning could be applied. Participants were exposed to exemplars of an artificial grammar under different (implicit and explicit) learning instructions. The analysis of receiver operating characteristics (ROC) during a final grammaticality judgment test revealed that explicit but not implicit learning led to rule knowledge. It also demonstrated that this knowledge base is built up gradually while similarity knowledge governed the initial state of learning. Together these results indicate that rule- and similarity-based mechanisms concur during AGL. Moreover, it could be speculated that two different rule processes might operate in parallel; bottom-up learning via gradual rule extraction and top-down learning via rule testing. Crucially, the latter is facilitated by performance feedback that encourages explicit hypothesis testing. Copyright © 2015 Elsevier Inc. All rights reserved.
Memory formation during anaesthesia: plausibility of a neurophysiological basis

PubMed Central

Veselis, R. A.

2015-01-01

As opposed to conscious, personally relevant (explicit) memories that we can recall at will, implicit (unconscious) memories are prototypical of ‘hidden’ memory; memories that exist, but that we do not know we possess. Nevertheless, our behaviour can be affected by these memories; in fact, these memories allow us to function in an ever-changing world. It is still unclear from behavioural studies whether similar memories can be formed during anaesthesia. Thus, a relevant question is whether implicit memory formation is a realistic possibility during anaesthesia, considering the underlying neurophysiology. A different conceptualization of memory taxonomy is presented, the serial parallel independent model of Tulving, which focuses on dynamic information processing with interactions among different memory systems rather than static classification of different types of memories. The neurophysiological basis for subliminal information processing is considered in the context of brain function as embodied in network interactions. Function of sensory cortices and thalamic activity during anaesthesia are reviewed. The role of sensory and perisensory cortices, in particular the auditory cortex, in support of memory function is discussed. Although improbable, with the current knowledge of neurophysiology one cannot rule out the possibility of memory formation during anaesthesia. PMID:25735711
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE Office of Scientific and Technical Information (OSTI.GOV)

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Aging and IQ effects on associative recognition and priming in item recognition

PubMed Central

McKoon, Gail; Ratcliff, Roger

2012-01-01

Two ways to examine memory for associative relationships between pairs of words were tested: an explicit method, associative recognition, and an implicit method, priming in item recognition. In an experiment with both kinds of tests, participants were asked to learn pairs of words. For the explicit test, participants were asked to decide whether two words of a test pair had been studied in the same or different pairs. For the implicit test, participants were asked to decide whether single words had or had not been among the studied pairs. Some test words were immediately preceded in the test list by the other word of the same pair and some by a word from a different pair. Diffusion model (Ratcliff, 1978; Ratcliff & McKoon, 2008) analyses were carried out for both tasks for college-age participants, 60–74 year olds, and 75–90 year olds, and for higher- and lower-IQ participants, in order to compare the two measures of associative strength. Results showed parallel behavior of drift rates for associative recognition and priming across ages and across IQ, indicating that they are based, at least to some degree, on the same information in memory. PMID:24976676

Nonlinear 3D visco-resistive MHD modeling of fusion plasmas: a comparison between numerical codes

NASA Astrophysics Data System (ADS)

Bonfiglio, D.; Chacon, L.; Cappello, S.

2008-11-01

Fluid plasma models (and, in particular, the MHD model) are extensively used in the theoretical description of laboratory and astrophysical plasmas. We present here a successful benchmark between two nonlinear, three-dimensional, compressible visco-resistive MHD codes. One is the fully implicit, finite volume code PIXIE3D [1,2], which is characterized by many attractive features, notably the generalized curvilinear formulation (which makes the code applicable to different geometries) and the possibility to include in the computation the energy transport equation and the extended MHD version of Ohm's law. In addition, the parallel version of the code features excellent scalability properties. Results from this code, obtained in cylindrical geometry, are compared with those produced by the semi-implicit cylindrical code SpeCyl, which uses finite differences radially, and spectral formulation in the other coordinates [3]. Both single and multi-mode simulations are benchmarked, regarding both reversed field pinch (RFP) and ohmic tokamak magnetic configurations. [1] L. Chacon, Computer Physics Communications 163, 143 (2004). [2] L. Chacon, Phys. Plasmas 15, 056103 (2008). [3] S. Cappello, Plasma Phys. Control. Fusion 46, B313 (2004) & references therein.
Dissociation between Conceptual and Perceptual Implicit Memory: Evidence from Patients with Frontal and Occipital Lobe Lesions

PubMed Central

Gong, Liang; Wang, JiHua; Yang, XuDong; Feng, Lei; Li, Xiu; Gu, Cui; Wang, MeiHong; Hu, JiaYun; Cheng, Huaidong

2016-01-01

The latest neuroimaging studies about implicit memory (IM) have revealed that different IM types may be processed by different parts of the brain. However, studies have rarely examined what subtypes of IM processes are affected in patients with various brain injuries. Twenty patients with frontal lobe injury, 25 patients with occipital lobe injury, and 29 healthy controls (HC) were recruited for the study. Two subtypes of IM were investigated by using structurally parallel perceptual (picture identification task) and conceptual (category exemplar generation task) IM tests in the three groups, as well as explicit memory (EM) tests. The results indicated that the priming of conceptual IM and EM tasks in patients with frontal lobe injury was poorer than that observed in HC, while perceptual IM was identical between the two groups. By contrast, the priming of perceptual IM in patients with occipital lobe injury was poorer than that in HC, whereas the priming of conceptual IM and EM was similar to that in HC. This double dissociation between perceptual and conceptual IM across the brain areas implies that occipital lobes may participate in perceptual IM, while frontal lobes may be involved in processing conceptual memory. PMID:26793093
A finite element solver for 3-D compressible viscous flows

NASA Technical Reports Server (NTRS)

Reddy, K. C.; Reddy, J. N.; Nayani, S.

1990-01-01

Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.
Spacecraft charging analysis with the implicit particle-in-cell code iPic3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deca, J.; Lapenta, G.; Marchand, R.

2013-10-15

We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation resultsmore » with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.« less
Distributing Earthquakes Among California's Faults: A Binary Integer Programming Approach

NASA Astrophysics Data System (ADS)

Geist, E. L.; Parsons, T.

2016-12-01

Statement of the problem is simple: given regional seismicity specified by a Gutenber-Richter (G-R) relation, how are earthquakes distributed to match observed fault-slip rates? The objective is to determine the magnitude-frequency relation on individual faults. The California statewide G-R b-value and a-value are estimated from historical seismicity, with the a-value accounting for off-fault seismicity. UCERF3 consensus slip rates are used, based on geologic and geodetic data and include estimates of coupling coefficients. The binary integer programming (BIP) problem is set up such that each earthquake from a synthetic catalog spanning millennia can occur at any location along any fault. The decision vector, therefore, consists of binary variables, with values equal to one indicating the location of each earthquake that results in an optimal match of slip rates, in an L1-norm sense. Rupture area and slip associated with each earthquake are determined from a magnitude-area scaling relation. Uncertainty bounds on the UCERF3 slip rates provide explicit minimum and maximum constraints to the BIP model, with the former more important to feasibility of the problem. There is a maximum magnitude limit associated with each fault, based on fault length, providing an implicit constraint. Solution of integer programming problems with a large number of variables (>105 in this study) has been possible only since the late 1990s. In addition to the classic branch-and-bound technique used for these problems, several other algorithms have been recently developed, including pre-solving, sifting, cutting planes, heuristics, and parallelization. An optimal solution is obtained using a state-of-the-art BIP solver for M≥6 earthquakes and California's faults with slip-rates > 1 mm/yr. Preliminary results indicate a surprising diversity of on-fault magnitude-frequency relations throughout the state.
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Distributed and parallel Ada and the Ada 9X recommendations

NASA Technical Reports Server (NTRS)

Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

1992-01-01

Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Ha, Sanghyun; Park, Junshin; You, Donghyun

2018-01-01

Utility of the computational power of Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. The Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution methods used in the semi-implicit fractional-step method take advantage of multiple tridiagonal matrices whose inversion is known as the major bottleneck for acceleration on a typical multi-core machine. A novel implementation of the semi-implicit fractional-step method designed for GPU acceleration of the incompressible Navier-Stokes equations is presented. Aspects of the programing model of Compute Unified Device Architecture (CUDA), which are critical to the bandwidth-bound nature of the present method are discussed in detail. A data layout for efficient use of CUDA libraries is proposed for acceleration of tridiagonal matrix inversion and fast Fourier transform. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while the Navier-Stokes equations are computed on a GPU. Performance of the present method using CUDA is assessed by comparing the speed of solving three tridiagonal matrices using ADI with the speed of solving one heptadiagonal matrix using a conjugate gradient method. An overall speedup of 20 times is achieved using a Tesla K40 GPU in comparison with a single-core Xeon E5-2660 v3 CPU in simulations of turbulent boundary-layer flow over a flat plate conducted on over 134 million grids. Enhanced performance of 48 times speedup is reached for the same problem using a Tesla P100 GPU.
Multibody dynamics model building using graphical interfaces

NASA Technical Reports Server (NTRS)

Macala, Glenn A.

1989-01-01

In recent years, the extremely laborious task of manually deriving equations of motion for the simulation of multibody spacecraft dynamics has largely been eliminated. Instead, the dynamicist now works with commonly available general purpose dynamics simulation programs which generate the equations of motion either explicitly or implicitly via computer codes. The user interface to these programs has predominantly been via input data files, each with its own required format and peculiarities, causing errors and frustrations during program setup. Recent progress in a more natural method of data input for dynamics programs: the graphical interface, is described.
Additional development of the XTRAN3S computer program

NASA Technical Reports Server (NTRS)

Borland, C. J.

1989-01-01

Additional developments and enhancements to the XTRAN3S computer program, a code for calculation of steady and unsteady aerodynamics, and associated aeroelastic solutions, for 3-D wings in the transonic flow regime are described. Algorithm improvements for the XTRAN3S program were provided including an implicit finite difference scheme to enhance the allowable time step and vectorization for improved computational efficiency. The code was modified to treat configurations with a fuselage, multiple stores/nacelles/pylons, and winglets. Computer program changes (updates) for error corrections and updates for version control are provided.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Evaluating Liberal Learning: Doubts and Explorations.

ERIC Educational Resources Information Center

Green, Thomas F.

1982-01-01

In current evaluation practice, the implicit philosophy of value, appraisal, and action is seen as a form of Benthamite utilitarianism. A domain of value called "educational worth" is described. Ways of detecting the presence of educational worth in liberal learning programs are identified. (MLW)
Effort and trust: the underpinnings of active learning.

PubMed

Adams, Seana; Bilimoria, Krish; Malhotra, Neha; Rangachari, P K

2017-09-01

Three undergraduate students and their teacher discuss two crucial issues that form the implicit basis of active learning: effort and trust. They use a single course in a Health Sciences Program to anchor their comments. Copyright © 2017 the American Physiological Society.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
A Newton-Krylov method with an approximate analytical Jacobian for implicit solution of Navier-Stokes equations on staggered overset-curvilinear grids with immersed boundaries.

PubMed

Asgharzadeh, Hafez; Borazjani, Iman

2017-02-15

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 - 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.

A Newton–Krylov method with an approximate analytical Jacobian for implicit solution of Navier–Stokes equations on staggered overset-curvilinear grids with immersed boundaries

PubMed Central

Asgharzadeh, Hafez; Borazjani, Iman

2016-01-01

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for nonlinear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42 – 74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal Jacobian when the stretching factor was increased, respectively. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80–90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future. PMID:28042172
A Newton-Krylov method with an approximate analytical Jacobian for implicit solution of Navier-Stokes equations on staggered overset-curvilinear grids with immersed boundaries

NASA Astrophysics Data System (ADS)

Asgharzadeh, Hafez; Borazjani, Iman

2017-02-01

The explicit and semi-implicit schemes in flow simulations involving complex geometries and moving boundaries suffer from time-step size restriction and low convergence rates. Implicit schemes can be used to overcome these restrictions, but implementing them to solve the Navier-Stokes equations is not straightforward due to their non-linearity. Among the implicit schemes for non-linear equations, Newton-based techniques are preferred over fixed-point techniques because of their high convergence rate but each Newton iteration is more expensive than a fixed-point iteration. Krylov subspace methods are one of the most advanced iterative methods that can be combined with Newton methods, i.e., Newton-Krylov Methods (NKMs) to solve non-linear systems of equations. The success of NKMs vastly depends on the scheme for forming the Jacobian, e.g., automatic differentiation is very expensive, and matrix-free methods without a preconditioner slow down as the mesh is refined. A novel, computationally inexpensive analytical Jacobian for NKM is developed to solve unsteady incompressible Navier-Stokes momentum equations on staggered overset-curvilinear grids with immersed boundaries. Moreover, the analytical Jacobian is used to form a preconditioner for matrix-free method in order to improve its performance. The NKM with the analytical Jacobian was validated and verified against Taylor-Green vortex, inline oscillations of a cylinder in a fluid initially at rest, and pulsatile flow in a 90 degree bend. The capability of the method in handling complex geometries with multiple overset grids and immersed boundaries is shown by simulating an intracranial aneurysm. It was shown that the NKM with an analytical Jacobian is 1.17 to 14.77 times faster than the fixed-point Runge-Kutta method, and 1.74 to 152.3 times (excluding an intensively stretched grid) faster than automatic differentiation depending on the grid (size) and the flow problem. In addition, it was shown that using only the diagonal of the Jacobian further improves the performance by 42-74% compared to the full Jacobian. The NKM with an analytical Jacobian showed better performance than the fixed point Runge-Kutta because it converged with higher time steps and in approximately 30% less iterations even when the grid was stretched and the Reynold number was increased. In fact, stretching the grid decreased the performance of all methods, but the fixed-point Runge-Kutta performance decreased 4.57 and 2.26 times more than NKM with a diagonal and full Jacobian, respectivley, when the stretching factor was increased. The NKM with a diagonal analytical Jacobian and matrix-free method with an analytical preconditioner are the fastest methods and the superiority of one to another depends on the flow problem. Furthermore, the implemented methods are fully parallelized with parallel efficiency of 80-90% on the problems tested. The NKM with the analytical Jacobian can guide building preconditioners for other techniques to improve their performance in the future.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking

NASA Astrophysics Data System (ADS)

Hussein, I.; MacMillan, R.

2014-09-01

Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures

PubMed Central

Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone

2018-01-01

We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Paul T.; Shadid, John N.; Sala, Marzio

In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu

2007-01-01

An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

PubMed Central

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
The Definition and Implementation of a Computer Programming Language Based on Constraints.

DTIC Science & Technology

1980-08-01

though not quite reached, is a complete programming system which will implicitly support the constraint paradigm to the same extent that IISP , say...and detecting and resolving conflicts, just as iisp provides certain services such as automatic storage management, which records given dala in a...defined- it permits the statement of equalities and some simple arithmetic relationships. An implementation representation is chosen, and IISP code for a
On a difficulty in eigenfunction expansion solutions for the start-up of fluid flow

NASA Astrophysics Data System (ADS)

Christov, Ivan C.

2015-11-01

Most mathematics and engineering textbooks describe the process of ``subtracting off'' the steady state of a linear parabolic partial differential equation as a technique for obtaining a boundary-value problem with homogeneous boundary conditions that can be solved by separation of variables (i.e., eigenfunction expansions). While this method produces the correct solution for the start-up of the flow of, e.g., a Newtonian fluid between parallel plates, it can lead to erroneous solutions to the corresponding problem for a class of non-Newtonian fluids. We show that the reason for this is the non-rigorous enforcement of the start-up condition in the textbook approach, which leads to a violation of the principle of causality. Nevertheless, these boundary-value problems can be solved correctly using eigenfunction expansions, and we present the formulation that makes this possible (in essence, an application of Duhamel's principle). The solutions obtained by this new approach are shown to agree identically with those obtained by using the Laplace transform in time only, a technique that enforces the proper start-up condition implicitly (hence, the same error cannot be committed). Supported, in part, by NSF Grant DMS-1104047 and the U.S. DOE (Contract No. DE-AC52-06NA25396) through the LANL/LDRD Program.
Evaluation of Finite-Rate Gas/Surface Interaction Models for a Carbon Based Ablator

NASA Technical Reports Server (NTRS)

Chen, Yih-Kanq; Goekcen, Tahir

2015-01-01

Two sets of finite-rate gas-surface interaction model between air and the carbon surface are studied. The first set is an engineering model with one-way chemical reactions, and the second set is a more detailed model with two-way chemical reactions. These two proposed models intend to cover the carbon surface ablation conditions including the low temperature rate-controlled oxidation, the mid-temperature diffusion-controlled oxidation, and the high temperature sublimation. The prediction of carbon surface recession is achieved by coupling a material thermal response code and a Navier-Stokes flow code. The material thermal response code used in this study is the Two-dimensional Implicit Thermal-response and Ablation Program, which predicts charring material thermal response and shape change on hypersonic space vehicles. The flow code solves the reacting full Navier-Stokes equations using Data Parallel Line Relaxation method. Recession analyses of stagnation tests conducted in NASA Ames Research Center arc-jet facilities with heat fluxes ranging from 45 to 1100 wcm2 are performed and compared with data for model validation. The ablating material used in these arc-jet tests is Phenolic Impregnated Carbon Ablator. Additionally, computational predictions of surface recession and shape change are in good agreement with measurement for arc-jet conditions of Small Probe Reentry Investigation for Thermal Protection System Engineering.
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
24 CFR 8.3 - Definitions.

Code of Federal Regulations, 2011 CFR

2011-04-01

... such as income as well as other explicit or implicit requirements inherent in the nature of the program... retardation, emotional illness, drug addiction and alcoholism. (b) Major life activities means functions such... with handicaps who, with reasonable accommodation, can perform the essential functions of the job in...
24 CFR 8.3 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-04-01

... such as income as well as other explicit or implicit requirements inherent in the nature of the program... retardation, emotional illness, drug addiction and alcoholism. (b) Major life activities means functions such... with handicaps who, with reasonable accommodation, can perform the essential functions of the job in...
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
A high-performance model for shallow-water simulations in distributed and heterogeneous architectures

NASA Astrophysics Data System (ADS)

Conde, Daniel; Canelas, Ricardo B.; Ferreira, Rui M. L.

2017-04-01

One of the most common challenges in hydrodynamic modelling is the trade off one must make between highly resolved simulations and the time required for their computation. In the particular case of urban floods, modelers are often forced to simplify the complex geometries of the problem, or to implicitly include some of its hydrodynamic effects, due to the typically very large spatial scales involved and limited computational resources. At CEris - Instituto Superior Técnico, Universidade de Lisboa - the STAV-2D shallow-water model, particularly suited for strong transient flows in complex and dynamic geometries, has been under development for the past recent years (Canelas et al., 2013 & Conde et al., 2013). The model is based on an explicit, first-order 2DH finite-volume discretization scheme for unstructured triangular meshes, in which a flux-splitting technique is paired with a reviewed Roe-Riemann solver, yielding a model applicable to discontinuous flows over time-evolving geometries. STAV-2D features solid transport in both Euleran and Lagrangian forms, with the first aiming at describing the transport of fine natural sediments and the latter aimed at large individual debris. The model has been validated with theoretical solutions and laboratory experiments (Canelas et al., 2013 & Conde et al., 2015). This work presents our most recent effort in STAV-2D: the re-design of the code in a modern Object-Oriented parallel framework for heterogeneous computations in CPUs and GPUs. The programming language of choice for this re-design was C++, due to its wide support of established and emerging parallel programming interfaces. The current implementation of STAV-2D provides two different levels of parallel granularity: inter-node and intra-node. Inter-node parallelism is achieved by distributing a simulation across a set of worker nodes, with communication between nodes being explicitly managed through MPI. At this level, the main difficulty is associated with the unstructured nature of the mesh topology with the corresponding employed solution, based on space-filling curves, being analyzed and discussed. Intra-node parallelism is achieved through OpenMP for CPUs and CUDA for GPUs, depending on which kind of device the process is running. Here the main difficulty is associated with the Object-Oriented approach, where the presence of complex data structures can degrade model performance considerably. STAV-2D now supports fully distributed and heterogeneous simulations where multiple different devices can be used to accelerate computation time. The advantages, short-comings and specific solutions for the employed unified Object-Oriented approach, where the source code for CPU and GPU has the same compilation units (no device specific branches like seen in available models), are discussed and quantified with a thorough scalability and performance analysis. The assembled parallel model is expected to achieve faster than real-time simulations for high resolutions (from meters to sub-meter) in large scaled problems (from cities to watersheds), effectively bridging the gap between detailed and timely simulation results. Acknowledgements This research as partially supported by Portuguese and European funds, within programs COMPETE2020 and PORL-FEDER, through project PTDC/ECM-HID/6387/2014 and Doctoral Grant SFRH/BD/97933/2013 granted by the National Foundation for Science and Technology (FCT). References Canelas, R.; Murillo, J. & Ferreira, R.M.L. (2013), Two-dimensional depth-averaged modelling of dam-break flows over mobile beds. Journal of Hydraulic Research, 51(4), 392-407. Conde, D. A. S.; Baptista, M. A. V.; Sousa Oliveira, C. & Ferreira, R. M. L. (2013), A shallow-flow model for the propagation of tsunamis over complex geometries and mobile beds, Nat. Hazards and Earth Syst. Sci., 13, 2533-2542. Conde, D. A. S.; Telhado, M. J.; Viana Baptista, M. A. & Ferreira, R. M. L. (2015) Severity and exposure associated with tsunami actions in urban waterfronts: the case of Lisbon, Portugal. Natural Hazards, Springer, 79, 2125, DOI:10.1007/s11069-015-1951-z
Concepts of Concurrent Programming

DTIC Science & Technology

1990-04-01

to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei; Xu, Jingling

2006-01-01

This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.
High-speed extended-term time-domain simulation for online cascading analysis of power system

NASA Astrophysics Data System (ADS)

Fu, Chuan

A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: (i) online analysis, including n-1 and n-k events, (ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, (iii) inclusion of rigorous protection-system modeling, (iv) intelligence for corrective action ID, storage, and fast retrieval, and (v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, (i) hardware, (ii) strategies, (iii) integration methods, (iv) nonlinear solvers, and (v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4 ) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSSE program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
On program restructuring, scheduling, and communication for parallel processor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polychronopoulos, Constantine D.

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
A self-consistent phase-field approach to implicit solvation of charged molecules with Poisson-Boltzmann electrostatics

NASA Astrophysics Data System (ADS)

Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J. Andrew

2015-12-01

Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson-Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum-Chandler-Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods.
A self-consistent phase-field approach to implicit solvation of charged molecules with Poisson-Boltzmann electrostatics.

PubMed

Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J Andrew

2015-12-28

Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson-Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum-Chandler-Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods.
A self-consistent phase-field approach to implicit solvation of charged molecules with Poisson–Boltzmann electrostatics

PubMed Central

Sun, Hui; Wen, Jiayi; Zhao, Yanxiang; Li, Bo; McCammon, J. Andrew

2015-01-01

Dielectric boundary based implicit-solvent models provide efficient descriptions of coarse-grained effects, particularly the electrostatic effect, of aqueous solvent. Recent years have seen the initial success of a new such model, variational implicit-solvent model (VISM) [Dzubiella, Swanson, and McCammon Phys. Rev. Lett. 96, 087802 (2006) and J. Chem. Phys. 124, 084905 (2006)], in capturing multiple dry and wet hydration states, describing the subtle electrostatic effect in hydrophobic interactions, and providing qualitatively good estimates of solvation free energies. Here, we develop a phase-field VISM to the solvation of charged molecules in aqueous solvent to include more flexibility. In this approach, a stable equilibrium molecular system is described by a phase field that takes one constant value in the solute region and a different constant value in the solvent region, and smoothly changes its value on a thin transition layer representing a smeared solute-solvent interface or dielectric boundary. Such a phase field minimizes an effective solvation free-energy functional that consists of the solute-solvent interfacial energy, solute-solvent van der Waals interaction energy, and electrostatic free energy described by the Poisson–Boltzmann theory. We apply our model and methods to the solvation of single ions, two parallel plates, and protein complexes BphC and p53/MDM2 to demonstrate the capability and efficiency of our approach at different levels. With a diffuse dielectric boundary, our new approach can describe the dielectric asymmetry in the solute-solvent interfacial region. Our theory is developed based on rigorous mathematical studies and is also connected to the Lum–Chandler–Weeks theory (1999). We discuss these connections and possible extensions of our theory and methods. PMID:26723595
The role of attention in human motor resonance

PubMed Central

Leonetti, Antonella; Landau, Ayelet; Fornia, Luca; Cerri, Gabriella; Borroni, Paola

2017-01-01

Observation of others' actions evokes in primary motor cortex and spinal circuits of observers a subliminal motor resonance response, which reflects the motor program encoding observed actions. We investigated the role of attention in human motor resonance with four experimental conditions, explored in different subject groups: in the first explicit condition, subjects were asked to observe a rhythmic hand flexion-extension movement performed live in front of them. In two other conditions subjects had to monitor the activity of a LED light mounted on the oscillating hand. The hand was clearly visible but it was not the focus of subjects’ attention: in the semi-implicit condition hand movement was relevant to task completion, while in the implicit condition it was irrelevant. In a fourth, baseline, condition subjects observed the rhythmic oscillation of a metal platform. Motor resonance was measured with the H-reflex technique as the excitability modulation of cortico-spinal motorneurons driving a hand flexor muscle. As expected, a normal resonant response developed in the explicit condition, and no resonant response in the baseline condition. Resonant responses also developed in both semi-implicit and implicit conditions and, surprisingly, were not different from each other, indicating that viewing an action is, per se, a powerful stimulus for the action observation network, even when it is not the primary focus of subjects’ attention and even when irrelevant to the task. However, the amplitude of these responses was much reduced compared to the explicit condition, and the phase-lock between the time courses of observed movement and resonant motor program was lost. In conclusion, different parameters of the response were differently affected by subtraction of attentional resources with respect to the explicit condition: time course and muscle selection were preserved while the activation of motor circuits resulted in much reduced amplitude and lost its kinematic specificity. PMID:28510605
PROTEUS two-dimensional Navier-Stokes computer code, version 1.0. Volume 2: User's guide

NASA Technical Reports Server (NTRS)

Towne, Charles E.; Schwab, John R.; Benson, Thomas J.; Suresh, Ambady

1990-01-01

A new computer code was developed to solve the two-dimensional or axisymmetric, Reynolds averaged, unsteady compressible Navier-Stokes equations in strong conservation law form. The thin-layer or Euler equations may also be solved. Turbulence is modeled using an algebraic eddy viscosity model. The objective was to develop a code for aerospace applications that is easy to use and easy to modify. Code readability, modularity, and documentation were emphasized. The equations are written in nonorthogonal body-fitted coordinates, and solved by marching in time using a fully-coupled alternating direction-implicit procedure with generalized first- or second-order time differencing. All terms are linearized using second-order Taylor series. The boundary conditions are treated implicitly, and may be steady, unsteady, or spatially periodic. Simple Cartesian or polar grids may be generated internally by the program. More complex geometries require an externally generated computational coordinate system. The documentation is divided into three volumes. Volume 2 is the User's Guide, and describes the program's general features, the input and output, the procedure for setting up initial conditions, the computer resource requirements, the diagnostic messages that may be generated, the job control language used to run the program, and several test cases.
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Web Based Parallel Programming Workshop for Undergraduate Education.

ERIC Educational Resources Information Center

Marcus, Robert L.; Robertson, Douglass

Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
Adaptive Numerical Algorithms in Space Weather Modeling

NASA Technical Reports Server (NTRS)

Toth, Gabor; vanderHolst, Bart; Sokolov, Igor V.; DeZeeuw, Darren; Gombosi, Tamas I.; Fang, Fang; Manchester, Ward B.; Meng, Xing; Nakib, Dalal; Powell, Kenneth G.;

2010-01-01

Space weather describes the various processes in the Sun-Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different physics in different domains. A multi-physics system can be modeled by a software framework comprising of several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solar wind Roe Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamics (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit numerical schemes. Depending on the application, we find that different time stepping methods are optimal. Several of the time integration schemes exploit the block-based granularity of the grid structure. The framework and the adaptive algorithms enable physics based space weather modeling and even forecasting.

SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Tools for Creating Mobile Applications for Extension

ERIC Educational Resources Information Center

Drill, Sabrina L.

2012-01-01

Considerations and tools for developing mobile applications for Extension include evaluating the topic, purpose, and audience. Different computing platforms may be used, and apps designed as modified Web pages or implicitly programmed for a particular platform. User privacy is another important consideration, especially for data collection apps.…
Morality and the Schools. Occasional Paper 32.

ERIC Educational Resources Information Center

Wicks, Robert S.

Moral contradictions and cross purposes in society make formal moral training in the schools difficult, if not impossible. Values clarification and school-wide programs of moral education are of questionable merit. Nevertheless, effective moral education is implicit in teaching the subjects that comprise good basic education. A mathematics…
The effect of implicitly incentivized faking on explicit and implicit measures of doping attitude: when athletes want to pretend an even more negative attitude to doping.

PubMed

Wolff, Wanja; Schindler, Sebastian; Brand, Ralf

2015-01-01

The Implicit Association Test (IAT) aims to measure participants' automatic evaluation of an attitude object and is useful especially for the measurement of attitudes related to socially sensitive subjects, e.g. doping in sports. Several studies indicate that IAT scores can be faked on instruction. But fully or semi-instructed research scenarios might not properly reflect what happens in more realistic situations, when participants secretly decide to try faking the test. The present study is the first to investigate IAT faking when there is only an implicit incentive to do so. Sixty-five athletes (22.83 years ± 2.45; 25 women) were randomly assigned to an incentive-to-fake condition or a control condition. Participants in the incentive-to-fake condition were manipulated to believe that athletes with lenient doping attitudes would be referred to a tedious 45-minute anti-doping program. Attitudes were measured with the pictorial doping brief IAT (BIAT) and with the Performance Enhancement Attitude Scale (PEAS). A one-way MANOVA revealed significant differences between conditions after the manipulation in PEAS scores, but not in the doping BIAT. In the light of our hypothesis this suggests that participants successfully faked an exceedingly negative attitude to doping when completing the PEAS, but were unsuccessful in doing so on the reaction time-based test. This study assessed BIAT faking in a setting that aimed to resemble a situation in which participants want to hide their attempts to cheat. The two measures of attitude were differentially affected by the implicit incentive. Our findings provide evidence that the pictorial doping BIAT is relatively robust against spontaneous and naïve faking attempts. (B)IATs might be less prone to faking than implied by previous studies.

Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.

1990-01-01

The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Aerodynamic optimization studies on advanced architecture computers

NASA Technical Reports Server (NTRS)

Chawla, Kalpana

1995-01-01

The approach to carrying out multi-discipline aerospace design studies in the future, especially in massively parallel computing environments, comprises of choosing (1) suitable solvers to compute solutions to equations characterizing a discipline, and (2) efficient optimization methods. In addition, for aerodynamic optimization problems, (3) smart methodologies must be selected to modify the surface shape. In this research effort, a 'direct' optimization method is implemented on the Cray C-90 to improve aerodynamic design. It is coupled with an existing implicit Navier-Stokes solver, OVERFLOW, to compute flow solutions. The optimization method is chosen such that it can accomodate multi-discipline optimization in future computations. In the work , however, only single discipline aerodynamic optimization will be included.
Accuracy of a class of concurrent algorithms for transient finite element analysis

NASA Technical Reports Server (NTRS)

Ortiz, Michael; Sotelino, Elisa D.; Nour-Omid, Bahram

1988-01-01

The accuracy of a new class of concurrent procedures for transient finite element analysis is examined. A phase error analysis is carried out which shows that wave retardation leading to unacceptable loss of accuracy may occur if a Courant condition based on the dimensions of the subdomains is violated. Numerical tests suggest that this Courant condition is conservative for typical structural applications and may lead to a marked increase in accuracy as the number of subdomains is increased. Theoretical speed-up ratios are derived which suggest that the algorithms under consideration can be expected to exhibit a performance superior to that of globally implicit methods when implemented on parallel machines.
Error analysis of multipoint flux domain decomposition methods for evolutionary diffusion problems

NASA Astrophysics Data System (ADS)

Arrarás, A.; Portero, L.; Yotov, I.

2014-01-01

We study space and time discretizations for mixed formulations of parabolic problems. The spatial approximation is based on the multipoint flux mixed finite element method, which reduces to an efficient cell-centered pressure system on general grids, including triangles, quadrilaterals, tetrahedra, and hexahedra. The time integration is performed by using a domain decomposition time-splitting technique combined with multiterm fractional step diagonally implicit Runge-Kutta methods. The resulting scheme is unconditionally stable and computationally efficient, as it reduces the global system to a collection of uncoupled subdomain problems that can be solved in parallel without the need for Schwarz-type iteration. Convergence analysis for both the semidiscrete and fully discrete schemes is presented.
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Contributions to DoD Mission Success from High Performance Computing - March 1995

DTIC Science & Technology

1995-03-01

the flow . The physics to be considered may entail additional force fields, coupling to surface physics and microphysics, changes of phase, changes...in this program concerns the structural mechanics of bolted-on propeller blades. An important objective of the program was to determine the effects of...motion between the rotor blades and the airframe. The flow past each component is then computed using an efficient, implicit three-dimensional unsteady
Improving DoD Logistics: Perspectives from RAND Research,

DTIC Science & Technology

1995-01-01

in-theater, particularly in the initial stages of the deployment), severe competition for strategic lift, and a more dynamic environment in which...the explicit or implicit intent to protect DoD providers and public-sector jobs.55 The lack of performance evaluation in favor of an emphasis on...programs to implement and propagate process improvements more widely. This chart lists elements of some of these programs. The intent is not to evaluate
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Program Correctness, Verification and Testing for Exascale (Corvette)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Koushik; Iancu, Costin; Demmel, James W

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
Aspect-Oriented Programming is Quantification and Implicit Invocation

NASA Technical Reports Server (NTRS)

Filman, Robert E.; Friedman, Daniel P.; Koga, Dennis (Technical Monitor)

2001-01-01

We propose that the distinguishing characteristic of Aspect-Oriented Programming (AOP) languages is that they allow programming by making quantified programmatic assertions over programs that lack local notation indicating the invocation of these assertions. This suggests that AOP systems can be analyzed with respect to three critical dimensions: the kinds of quantifications allowed, the nature of the interactions that can be asserted, and the mechanism for combining base-level actions with asserted actions. Consequences of this perspective are the recognition that certain systems are not AOP and that some mechanisms are metabolism: they are sufficiently expressive to allow straightforwardly programming an AOP system within them.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.

MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
STAFF REPORT ON CONFERENCE FINDINGS. APPENDIX II.

ERIC Educational Resources Information Center

National Council on the Aging, Inc., New York, NY.

CONFERENCE SPEAKERS, PANELISTS, AND WORKSHOPS WERE ASKED TO FOCUS ON THE PROBLEMS OF THE 45-PLUS AGE GROUP AND TO MAKE SPECIFIC RECOMMENDATIONS ON PUBLIC POLICY, LEGISLATION, AND PUBLIC AND PRIVATE RESEARCH, PROGRAM AND ACTION. IN ADDITION, STAFF RECOMMENDATIONS WHICH WERE IMPLICIT IN THE PROCEEDINGS WERE INCLUDED. UNDER EACH SUB-HEADING,…
Listening Skills Training: Application to Crisis Intervention Programs.

ERIC Educational Resources Information Center

Coonfield, Ted J.; And Others

A review of the literature in listening behavior reveals an increasing interest in the importance of listening in the communication process and the therapeutic situation. Since crisis workers are continually confronted with feeling-laden messages in which the implicit, unspoken, and covert content is vital, empathic listening is a necessity. The…
Discipline for Democracy? School Districts' Management of Conflict and Social Exclusion

ERIC Educational Resources Information Center

Bickmore, Kathy

2004-01-01

An examination of six urban Canadian school districts' policies and co-curricular programs for safe and inclusive schools shows contrasting implicit patterns of citizenship education. Peacekeeping-oriented districts relied heavily on standardized control and exclusion to achieve school safety and allocated few resources to affirming diversity.…
Polymorphous Computing Architectures

DTIC Science & Technology

2007-12-12

provide a multiprocessor implementation. In this work, we introduce the Atomos transactional programming language, which is the first to include...implicit transactions, strong atomicity, and a scalable multiprocessor implementation [47]. Atomos is derived from Java, but replaces its synchronization...and conditional waiting constructs with transactional alternatives. The Atomos conditional waiting proposal is tailored to allow efficient
A comparative analysis of locally based conservation education programs that promote issue awareness and community solutions within Honduras and the United States

NASA Astrophysics Data System (ADS)

Weber, Nicole R.

Public understanding and concern for environment issues is critical to conservation efforts. In this study, I investigated education programs focused on the local environmental issues and their impact on sense of place, environmental knowledge, empowerment and awareness (Honduras and Boston). I hypothesized that the curriculum will have an effect on multiple student measures and teachers who participate in workshops will have greater ownership of the curriculum, influencing curriculum's effectiveness. Then I looked at the relation of environmental knowledge to environmental connection, at the regional (Honduras) and international levels (Honduras vs. United States), comparing cultural differences in same measures mentioned. I hypothesized that a population connected to their natural surroundings will have an embedded biological understanding and appreciation of their surroundings. I surveyed a total of 887 students (727 Honduras, 160 Boston) and 293 teachers (Honduras), with participant and nonparticipant teachers included, in a pre/post/follow-up survey design. To evaluate these hypotheses, I used multiple measures to assess program success and regional differences: implicit measures (general sense of place); explicit measures (knowledge of problems and solutions; degree of specificity in thinking about these issues); and affective and attitudinal components (sense of empowerment). For the exploratory study, I gathered parallel data from teachers, so that the effects of the program on both teachers and students would be evident. Our results indicate that there were significant changes in number of problem and solution types proposed by students, that students' responses matched those of their teacher on some measures (but not all) by the end of the program. In Honduras, the main effect of being in the teacher workshop appears to be in their willingness to teach environmental education. Results for student's sense of place and environmental empowerment were inconsistent across programs. In addition, participants (teachers and students ) were not at the cap (as experts) for a number of measures, suggesting that the workshops and curriculum can be further improved. For the comparative study, there was strong support for a population's connection to their local natural surroundings having a strong relation to their sense of place, and partially related to a heightened environmental awareness.
The parallel programming of voluntary and reflexive saccades.

PubMed

Walker, Robin; McSorley, Eugene

2006-06-01

A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-18

...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.
Scalable Nonlinear Solvers for Fully Implicit Coupled Nuclear Fuel Modeling. Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai, Xiao-Chuan; Keyes, David; Yang, Chao

2014-09-29

The focus of the project is on the development and customization of some highly scalable domain decomposition based preconditioning techniques for the numerical solution of nonlinear, coupled systems of partial differential equations (PDEs) arising from nuclear fuel simulations. These high-order PDEs represent multiple interacting physical fields (for example, heat conduction, oxygen transport, solid deformation), each is modeled by a certain type of Cahn-Hilliard and/or Allen-Cahn equations. Most existing approaches involve a careful splitting of the fields and the use of field-by-field iterations to obtain a solution of the coupled problem. Such approaches have many advantages such as ease of implementationmore » since only single field solvers are needed, but also exhibit disadvantages. For example, certain nonlinear interactions between the fields may not be fully captured, and for unsteady problems, stable time integration schemes are difficult to design. In addition, when implemented on large scale parallel computers, the sequential nature of the field-by-field iterations substantially reduces the parallel efficiency. To overcome the disadvantages, fully coupled approaches have been investigated in order to obtain full physics simulations.« less
High-resolution multi-code implementation of unsteady Navier-Stokes flow solver based on paralleled overset adaptive mesh refinement and high-order low-dissipation hybrid schemes

NASA Astrophysics Data System (ADS)

Li, Gaohua; Fu, Xiang; Wang, Fuxin

2017-10-01

The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2016-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2015-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Three dimensional modelling of earthquake rupture cycles on frictional faults

NASA Astrophysics Data System (ADS)

Simpson, Guy; May, Dave

2017-04-01

We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Parallel Cartesian grid refinement for 3D complex flow simulations

NASA Astrophysics Data System (ADS)

Angelidis, Dionysios; Sotiropoulos, Fotis

2013-11-01

A second order accurate method for discretizing the Navier-Stokes equations on 3D unstructured Cartesian grids is presented. Although the grid generator is based on the oct-tree hierarchical method, fully unstructured data-structure is adopted enabling robust calculations for incompressible flows, avoiding both the need of synchronization of the solution between different levels of refinement and usage of prolongation/restriction operators. The current solver implements a hybrid staggered/non-staggered grid layout, employing the implicit fractional step method to satisfy the continuity equation. The pressure-Poisson equation is discretized by using a novel second order fully implicit scheme for unstructured Cartesian grids and solved using an efficient Krylov subspace solver. The momentum equation is also discretized with second order accuracy and the high performance Newton-Krylov method is used for integrating them in time. Neumann and Dirichlet conditions are used to validate the Poisson solver against analytical functions and grid refinement results to a significant reduction of the solution error. The effectiveness of the fractional step method results in the stability of the overall algorithm and enables the performance of accurate multi-resolution real life simulations. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482.

Nonlinear three-dimensional verification of the SPECYL and PIXIE3D magnetohydrodynamics codes for fusion plasmas

NASA Astrophysics Data System (ADS)

Bonfiglio, D.; Chacón, L.; Cappello, S.

2010-08-01

With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacón, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code in cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.
Nonlinear three-dimensional verification of the SPECYL and PIXIE3D magnetohydrodynamics codes for fusion plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonfiglio, Daniele; Chacon, Luis; Cappello, Susanna

2010-01-01

With the increasing impact of scientific discovery via advanced computation, there is presently a strong emphasis on ensuring the mathematical correctness of computational simulation tools. Such endeavor, termed verification, is now at the center of most serious code development efforts. In this study, we address a cross-benchmark nonlinear verification study between two three-dimensional magnetohydrodynamics (3D MHD) codes for fluid modeling of fusion plasmas, SPECYL [S. Cappello and D. Biskamp, Nucl. Fusion 36, 571 (1996)] and PIXIE3D [L. Chacon, Phys. Plasmas 15, 056103 (2008)], in their common limit of application: the simple viscoresistive cylindrical approximation. SPECYL is a serial code inmore » cylindrical geometry that features a spectral formulation in space and a semi-implicit temporal advance, and has been used extensively to date for reversed-field pinch studies. PIXIE3D is a massively parallel code in arbitrary curvilinear geometry that features a conservative, solenoidal finite-volume discretization in space, and a fully implicit temporal advance. The present study is, in our view, a first mandatory step in assessing the potential of any numerical 3D MHD code for fluid modeling of fusion plasmas. Excellent agreement is demonstrated over a wide range of parameters for several fusion-relevant cases in both two- and three-dimensional geometries.« less
Memory formation during anaesthesia: plausibility of a neurophysiological basis.

PubMed

Veselis, R A

2015-07-01

As opposed to conscious, personally relevant (explicit) memories that we can recall at will, implicit (unconscious) memories are prototypical of 'hidden' memory; memories that exist, but that we do not know we possess. Nevertheless, our behaviour can be affected by these memories; in fact, these memories allow us to function in an ever-changing world. It is still unclear from behavioural studies whether similar memories can be formed during anaesthesia. Thus, a relevant question is whether implicit memory formation is a realistic possibility during anaesthesia, considering the underlying neurophysiology. A different conceptualization of memory taxonomy is presented, the serial parallel independent model of Tulving, which focuses on dynamic information processing with interactions among different memory systems rather than static classification of different types of memories. The neurophysiological basis for subliminal information processing is considered in the context of brain function as embodied in network interactions. Function of sensory cortices and thalamic activity during anaesthesia are reviewed. The role of sensory and perisensory cortices, in particular the auditory cortex, in support of memory function is discussed. Although improbable, with the current knowledge of neurophysiology one cannot rule out the possibility of memory formation during anaesthesia. © The Author 2015. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
A Modular Three-Dimensional Finite-Difference Ground-Water Flow Model

USGS Publications Warehouse

McDonald, Michael G.; Harbaugh, Arlen W.; Guo, Weixing; Lu, Guoping

1988-01-01

This report presents a finite-difference model and its associated modular computer program. The model simulates flow in three dimensions. The report includes detailed explanations of physical and mathematical concepts on which the model is based and an explanation of how those concepts are incorporated in the modular structure of the computer program. The modular structure consists of a Main Program and a series of highly independent subroutines called 'modules.' The modules are grouped into 'packages.' Each package deals with a specific feature of the hydrologic system which is to be simulated, such as flow from rivers or flow into drains, or with a specific method of solving linear equations which describe the flow system, such as the Strongly Implicit Procedure or Slice-Successive Overrelaxation. The division of the program into modules permits the user to examine specific hydrologic features of the model independently. This also facilita development of additional capabilities because new packages can be added to the program without modifying the existing packages. The input and output systems of the computer program are also designed to permit maximum flexibility. Ground-water flow within the aquifer is simulated using a block-centered finite-difference approach. Layers can be simulated as confined, unconfined, or a combination of confined and unconfined. Flow associated with external stresses, such as wells, areal recharge, evapotranspiration, drains, and streams, can also be simulated. The finite-difference equations can be solved using either the Strongly Implicit Procedure or Slice-Successive Overrelaxation. The program is written in FORTRAN 77 and will run without modification on most computers that have a FORTRAN 77 compiler. For each program ,module, this report includes a narrative description, a flow chart, a list of variables, and a module listing.
Describing, using 'recognition cones'. [parallel-series model with English-like computer program

NASA Technical Reports Server (NTRS)

Uhr, L.

1973-01-01

A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Virtual earthquake engineering laboratory with physics-based degrading materials on parallel computers

NASA Astrophysics Data System (ADS)

Cho, In Ho

For the last few decades, we have obtained tremendous insight into underlying microscopic mechanisms of degrading quasi-brittle materials from persistent and near-saintly efforts in laboratories, and at the same time we have seen unprecedented evolution in computational technology such as massively parallel computers. Thus, time is ripe to embark on a novel approach to settle unanswered questions, especially for the earthquake engineering community, by harmoniously combining the microphysics mechanisms with advanced parallel computing technology. To begin with, it should be stressed that we placed a great deal of emphasis on preserving clear meaning and physical counterparts of all the microscopic material models proposed herein, since it is directly tied to the belief that by doing so, the more physical mechanisms we incorporate, the better prediction we can obtain. We departed from reviewing representative microscopic analysis methodologies, selecting out "fixed-type" multidirectional smeared crack model as the base framework for nonlinear quasi-brittle materials, since it is widely believed to best retain the physical nature of actual cracks. Microscopic stress functions are proposed by integrating well-received existing models to update normal stresses on the crack surfaces (three orthogonal surfaces are allowed to initiate herein) under cyclic loading. Unlike the normal stress update, special attention had to be paid to the shear stress update on the crack surfaces, due primarily to the well-known pathological nature of the fixed-type smeared crack model---spurious large stress transfer over the open crack under nonproportional loading. In hopes of exploiting physical mechanism to resolve this deleterious nature of the fixed crack model, a tribology-inspired three-dimensional (3d) interlocking mechanism has been proposed. Following the main trend of tribology (i.e., the science and engineering of interacting surfaces), we introduced the base fabric of solid particle-soft matrix to explain realistic interlocking over rough crack surfaces, and the adopted Gaussian distribution feeds random particle sizes to the entire domain. Validation against a well-documented rough crack experiment reveals promising accuracy of the proposed 3d interlocking model. A consumed energy-based damage model has been proposed for the weak correlation between the normal and shear stresses on the crack surfaces, and also for describing the nature of irrecoverable damage. Since the evaluation of the consumed energy is directly linked to the microscopic deformation, which can be efficiently tracked on the crack surfaces, the proposed damage model is believed to provide a more physical interpretation than existing damage mechanics, which fundamentally stem from mathematical derivation with few physical counterparts. Another novel point of the present work lies in the topological transition-based "smart" steel bar model, notably with evolving compressive buckling length. We presented a systematic framework of information flow between the key ingredients of composite materials (i.e., steel bar and its surrounding concrete elements). The smart steel model suggested can incorporate smooth transition during reversal loading, tensile rupture, early buckling after reversal from excessive tensile loading, and even compressive buckling. Especially, the buckling length is made to evolve according to the damage states of the surrounding elements of each bar, while all other dominant models leave the length unchanged. What lies behind all the aforementioned novel attempts is, of course, the problem-optimized parallel platform. In fact, the parallel computing in our field has been restricted to monotonic shock or blast loading with explicit algorithm which is characteristically feasible to be parallelized. In the present study, efficient parallelization strategies for the highly demanding implicit nonlinear finite element analysis (FEA) program for real-scale reinforced concrete (RC) structures under cyclic loading are proposed. Quantitative comparison of state-of-the-art parallel strategies, in terms of factorization, had been carried out, leading to the problem-optimized solver, which is successfully embracing the penalty method and banded nature. Particularly, the penalty method employed imparts considerable smoothness to the global response, which yields a practical superiority of the parallel triangular system solver over other advanced solvers such as parallel preconditioned conjugate gradient method. Other salient issues on parallelization are also addressed. The parallel platform established offers unprecedented access to simulations of real-scale structures, giving new understanding about the physics-based mechanisms adopted and probabilistic randomness at the entire system level. Particularly, the platform enables bold simulations of real-scale RC structures exposed to cyclic loading---H-shaped wall system and 4-story T-shaped wall system. The simulations show the desired capability of accurate prediction of global force-displacement responses, postpeak softening behavior, and compressive buckling of longitudinal steel bars. It is fascinating to see that intrinsic randomness of the 3d interlocking model appears to cause "localized" damage of the real-scale structures, which is consistent with reported observations in different fields such as granular media. Equipped with accuracy, stability and scalability as demonstrated so far, the parallel platform is believed to serve as a fertile ground for the introducing of further physical mechanisms into various research fields as well as the earthquake engineering community. In the near future, it can be further expanded to run in concert with reliable FEA programs such as FRAME3d or OPENSEES. Following the central notion of "multiscale" analysis technique, actual infrastructures exposed to extreme natural hazard can be successfully tackled by this next generation analysis tool---the harmonious union of the parallel platform and a general FEA program. At the same time, any type of experiments can be easily conducted by this "virtual laboratory."
A NUMERICAL ALGORITHM FOR MODELING MULTIGROUP NEUTRINO-RADIATION HYDRODYNAMICS IN TWO SPATIAL DIMENSIONS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swesty, F. Douglas; Myra, Eric S.

It is now generally agreed that multidimensional, multigroup, neutrino-radiation hydrodynamics (RHD) is an indispensable element of any realistic model of stellar-core collapse, core-collapse supernovae, and proto-neutron star instabilities. We have developed a new, two-dimensional, multigroup algorithm that can model neutrino-RHD flows in core-collapse supernovae. Our algorithm uses an approach similar to the ZEUS family of algorithms, originally developed by Stone and Norman. However, this completely new implementation extends that previous work in three significant ways: first, we incorporate multispecies, multigroup RHD in a flux-limited-diffusion approximation. Our approach is capable of modeling pair-coupled neutrino-RHD, and includes effects of Pauli blocking inmore » the collision integrals. Blocking gives rise to nonlinearities in the discretized radiation-transport equations, which we evolve implicitly in time. We employ parallelized Newton-Krylov methods to obtain a solution of these nonlinear, implicit equations. Our second major extension to the ZEUS algorithm is the inclusion of an electron conservation equation that describes the evolution of electron-number density in the hydrodynamic flow. This permits calculating deleptonization of a stellar core. Our third extension modifies the hydrodynamics algorithm to accommodate realistic, complex equations of state, including those having nonconvex behavior. In this paper, we present a description of our complete algorithm, giving sufficient details to allow others to implement, reproduce, and extend our work. Finite-differencing details are presented in appendices. We also discuss implementation of this algorithm on state-of-the-art, parallel-computing architectures. Finally, we present results of verification tests that demonstrate the numerical accuracy of this algorithm on diverse hydrodynamic, gravitational, radiation-transport, and RHD sample problems. We believe our methods to be of general use in a variety of model settings where radiation transport or RHD is important. Extension of this work to three spatial dimensions is straightforward.« less
Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

NASA Astrophysics Data System (ADS)

Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

2014-12-01

The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Parallel Aircraft Trajectory Optimization with Analytic Derivatives

NASA Technical Reports Server (NTRS)

Falck, Robert D.; Gray, Justin S.; Naylor, Bret

2016-01-01

Trajectory optimization is an integral component for the design of aerospace vehicles, but emerging aircraft technologies have introduced new demands on trajectory analysis that current tools are not well suited to address. Designing aircraft with technologies such as hybrid electric propulsion and morphing wings requires consideration of the operational behavior as well as the physical design characteristics of the aircraft. The addition of operational variables can dramatically increase the number of design variables which motivates the use of gradient based optimization with analytic derivatives to solve the larger optimization problems. In this work we develop an aircraft trajectory analysis tool using a Legendre-Gauss-Lobatto based collocation scheme, providing analytic derivatives via the OpenMDAO multidisciplinary optimization framework. This collocation method uses an implicit time integration scheme that provides a high degree of sparsity and thus several potential options for parallelization. The performance of the new implementation was investigated via a series of single and multi-trajectory optimizations using a combination of parallel computing and constraint aggregation. The computational performance results show that in order to take full advantage of the sparsity in the problem it is vital to parallelize both the non-linear analysis evaluations and the derivative computations themselves. The constraint aggregation results showed a significant numerical challenge due to difficulty in achieving tight convergence tolerances. Overall, the results demonstrate the value of applying analytic derivatives to trajectory optimization problems and lay the foundation for future application of this collocation based method to the design of aircraft with where operational scheduling of technologies is key to achieving good performance.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Paralex: An Environment for Parallel Programming in Distributed Systems

DTIC Science & Technology

1991-12-07

distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Efficient Iterative Methods Applied to the Solution of Transonic Flows

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

1996-02-01

We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less

Improving the Efficiency of Non-equilibrium Sampling in the Aqueous Environment via Implicit-Solvent Simulations.

PubMed

Liu, Hui; Chen, Fu; Sun, Huiyong; Li, Dan; Hou, Tingjun

2017-04-11

By means of estimators based on non-equilibrium work, equilibrium free energy differences or potentials of mean force (PMFs) of a system of interest can be computed from biased molecular dynamics (MD) simulations. The approach, however, is often plagued by slow conformational sampling and poor convergence, especially when the solvent effects are taken into account. Here, as a possible way to alleviate the problem, several widely used implicit-solvent models, which are derived from the analytic generalized Born (GB) equation and implemented in the AMBER suite of programs, were employed in free energy calculations based on non-equilibrium work and evaluated for their abilities to emulate explicit water. As a test case, pulling MD simulations were carried out on an alanine polypeptide with different solvent models and protocols, followed by comparisons of the reconstructed PMF profiles along the unfolding coordinate. The results show that when employing the non-equilibrium work method, sampling with an implicit-solvent model is several times faster and, more importantly, converges more rapidly than that with explicit water due to reduction of dissipation. Among the assessed GB models, the Neck variants outperform the OBC and HCT variants in terms of accuracy, whereas their computational costs are comparable. In addition, for the best-performing models, the impact of the solvent-accessible surface area (SASA) dependent nonpolar solvation term was also examined. The present study highlights the advantages of implicit-solvent models for non-equilibrium sampling.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

PubMed

Burkhart, Diane N; Lischka, Terri A

2011-04-01

Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.
Measuring implicit attitudes: A positive framing bias flaw in the Implicit Relational Assessment Procedure (IRAP).

PubMed

O'Shea, Brian; Watson, Derrick G; Brown, Gordon D A

2016-02-01

How can implicit attitudes best be measured? The Implicit Relational Assessment Procedure (IRAP), unlike the Implicit Association Test (IAT), claims to measure absolute, not just relative, implicit attitudes. In the IRAP, participants make congruent (Fat Person-Active: false; Fat Person-Unhealthy: true) or incongruent (Fat Person-Active: true; Fat Person-Unhealthy: false) responses in different blocks of trials. IRAP experiments have reported positive or neutral implicit attitudes (e.g., neutral attitudes toward fat people) in cases in which negative attitudes are normally found on explicit or other implicit measures. It was hypothesized that these results might reflect a positive framing bias (PFB) that occurs when participants complete the IRAP. Implicit attitudes toward categories with varying prior associations (nonwords, social systems, flowers and insects, thin and fat people) were measured. Three conditions (standard, positive framing, and negative framing) were used to measure whether framing influenced estimates of implicit attitudes. It was found that IRAP scores were influenced by how the task was framed to the participants, that the framing effect was modulated by the strength of prior stimulus associations, and that a default PFB led to an overestimation of positive implicit attitudes when measured by the IRAP. Overall, the findings question the validity of the IRAP as a tool for the measurement of absolute implicit attitudes. A new tool (Simple Implicit Procedure:SIP) for measuring absolute, not just relative, implicit attitudes is proposed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Integrating Engineering into K-6 Curriculum: Developing Talent in the STEM Disciplines

ERIC Educational Resources Information Center

Mann, Eric L.; Mann, Rebecca L.; Strutz, Michele L.; Duncan, Daphne; Yoon, So Yoon

2011-01-01

The fields of gifted and engineering education share many common interests, and their students share many common attributes. Infusing and making engineering implicit in the K-6 education programs creates opportunities to develop concepts, skills, and habits of the mind that are valuable in all disciplines while providing opportunities to discover…
Manifestations of Hidden Curriculum in a Community College Online Opticianry Program: An Ecological Approach

ERIC Educational Resources Information Center

Hubbard, Barry

2010-01-01

Understanding the influential factors at work within an online learning environment is a growing area of interest. Hidden or implicit expectations, skill sets, knowledge, and social process can help or hinder student achievement, belief systems, and persistence. This qualitative study investigated how hidden curricular issues transpired in an…
Sketching Some Postmodern Alternatives: Beyond Paradigms and Research Programs as Referents for Science Education.

ERIC Educational Resources Information Center

Geelan, David R.

2000-01-01

Suggests that Kuhn's and Lakatos' schemes for the philosophy of science have been pervasive metaphors for conceptual change approaches to the learning and teaching of science, and have been used both implicitly and explicitly to provide an organizing framework and justification matrix for those perspectives. Describes four alternative perspectives…
New Instruments for Studying the Impacts of Science Teacher Professional Development

ERIC Educational Resources Information Center

Trygstad, Peggy J.; Banilower, Eric R.; Smith, P. Sean; Nelson, Courtney L.

2014-01-01

The logic model that implicitly drives most professional development (PD) efforts asserts that PD leads to changes in teacher knowledge and beliefs, which leads to improved classroom practice, and ultimately, better student outcomes. However, efforts to study the impacts of PD programs are often hampered by the scarcity of high-quality…
36 CFR 701.5 - Policy on authorized use of the Library name, seal, or logo.

Code of Federal Regulations, 2011 CFR

2011-07-01

... officially to represent the Library of Congress and its programs, projects, functions, activities, or... in cooperative activities. Use of the Library name or logo in any context suggesting an explicit or implicit endorsement may be approved in only those instances where the Library has sufficient control over...
36 CFR 701.5 - Policy on authorized use of the Library name, seal, or logo.

Code of Federal Regulations, 2010 CFR

2010-07-01

... officially to represent the Library of Congress and its programs, projects, functions, activities, or... in cooperative activities. Use of the Library name or logo in any context suggesting an explicit or implicit endorsement may be approved in only those instances where the Library has sufficient control over...
Synergies and Balance between Values Education and Quality Teaching

ERIC Educational Resources Information Center

Lovat, Terence J .

2010-01-01

The article will focus on the implicit values dimension that is evident in research findings concerning quality teaching. Furthermore, it sets out to demonstrate that maximizing the effects of quality teaching requires explicit attention to this values dimension and that this can be achieved through a well-crafted values education program.…
Moving from Explicit to Implicit: A Case Study of Improving Inferential Comprehension

ERIC Educational Resources Information Center

Yeh, Yi-Fen; McTigue, Erin M.; Joshi, R. Malatesha

2012-01-01

The article describes a successful intervention program in developing inferential comprehension in a sixth grader. Steve (pseudonym) was proficient in word reading, was able to detect explicit information while reading, but struggled with linking textual information to yield integral ideas. After 10 weeks of working with Steve on word analogies,…
Positive Youth Development from Sport to Life: Explicit or Implicit Transfer?

ERIC Educational Resources Information Center

Turnnidge, Jennifer; Côté, Jean; Hancock, David J.

2014-01-01

While previous studies indicate that participation in sport has the potential to facilitate positive developmental outcomes, there is a lack of consensus regarding the possible transfer of these outcomes to other environments (i.e., school or work). An important issue within the positive development literature concerns how sport programs should…
Infant-Directed Media: An Analysis of Product Information and Claims

ERIC Educational Resources Information Center

Fenstermacher, Susan K.; Barr, Rachel; Salerno, Katherine; Garcia, Amaya; Shwery, Clay E.; Calvert, Sandra L.; Linebarger, Deborah L.

2010-01-01

Infant DVDs typically have titles and even company names that imply some educational benefit. It is not known whether these educational claims are reflected in actual content. The present study examined this question. Of 686 claims (across 58 programs) listed on packaging, websites and promotional materials, implicit claims were most frequent…
Gatekeeping and Competency-Based Education: Developing Behaviorally Specific Remediation Policies

ERIC Educational Resources Information Center

Hylton, Mary E.; Manit, Jill; Messick-Svare, Gloria

2017-01-01

Gatekeeping has long been an integral component of what is now referred to as the Implicit Curriculum, or the context in which professional social work education occurs. Despite its long-standing role within social work education, gatekeeping elicits conflict for both individual faculty members and entire programs of social work education. Much of…
Family Poicy in Canada: Some Theoretical Considerations and a Practical Application.

ERIC Educational Resources Information Center

Hepworth, H. Philip

Frequently implicit in Canadian social policy addressing other issues, family policy is generally assumed to be a good thing, is bound up with social structure, and, when made explicit, is prescriptive and potentially embarrassing to government. Historically important as a forerunner of more recent income assistance programs, the provision of…
24 CFR 901.25 - Indicator #4, work orders.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 24 Housing and Urban Development 4 2010-04-01 2010-04-01 false Indicator #4, work orders. 901.25... DEVELOPMENT PUBLIC HOUSING MANAGEMENT ASSESSMENT PROGRAM § 901.25 Indicator #4, work orders. This indicator... work orders. Implicit in this indicator is the adequacy of the PHA's work order system in terms of how...
PROTEUS two-dimensional Navier-Stokes computer code, version 1.0. Volume 3: Programmer's reference

NASA Technical Reports Server (NTRS)

Towne, Charles E.; Schwab, John R.; Benson, Thomas J.; Suresh, Ambady

1990-01-01

A new computer code was developed to solve the 2-D or axisymmetric, Reynolds-averaged, unsteady compressible Navier-Stokes equations in strong conservation law form. The thin-layer or Euler equations may also be solved. Turbulence is modeled using an algebraic eddy viscosity model. The objective was to develop a code for aerospace applications that is easy to use and easy to modify. Code readability, modularity, and documentation were emphasized. The equations are written in nonorthogonal body-fitted coordinates, and solved by marching in time using a fully-coupled alternating-direction-implicit procedure with generalized first- or second-order time differencing. All terms are linearized using second-order Taylor series. The boundary conditions are treated implicitly, and may be steady, unsteady, or spatially periodic. Simple Cartesian or polar grids may be generated internally by the program. More complex geometries require an externally generated computational coordinate system. The documentation is divided into three volumes. Volume 3 is the Programmer's Reference, and describes the program structure, the FORTRAN variables stored in common blocks, and the details of each subprogram.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.

Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Gifted Students' Implicit Beliefs about Intelligence and Giftedness

ERIC Educational Resources Information Center

Makel, Matthew C.; Snyder, Kate E.; Thomas, Chandler; Malone, Patrick S.; Putallaz, Martha

2015-01-01

Growing attention is being paid to individuals' implicit beliefs about the nature of intelligence. However, implicit beliefs about giftedness are currently underexamined. In the current study, we examined academically gifted adolescents' implicit beliefs about both intelligence and giftedness. Overall, participants' implicit beliefs about…
A three-dimensional application with the numerical grid generation code: EAGLE (utilizing an externally generated surface)

NASA Technical Reports Server (NTRS)

Houston, Johnny L.

1990-01-01

Program EAGLE (Eglin Arbitrary Geometry Implicit Euler) is a multiblock grid generation and steady-state flow solver system. This system combines a boundary conforming surface generation, a composite block structure grid generation scheme, and a multiblock implicit Euler flow solver algorithm. The three codes are intended to be used sequentially from the definition of the configuration under study to the flow solution about the configuration. EAGLE was specifically designed to aid in the analysis of both freestream and interference flow field configurations. These configurations can be comprised of single or multiple bodies ranging from simple axisymmetric airframes to complex aircraft shapes with external weapons. Each body can be arbitrarily shaped with or without multiple lifting surfaces. Program EAGLE is written to compile and execute efficiently on any CRAY machine with or without Solid State Disk (SSD) devices. Also, the code uses namelist inputs which are supported by all CRAY machines using the FORTRAN Compiler CF177. The use of namelist inputs makes it easier for the user to understand the inputs and to operate Program EAGLE. Recently, the Code was modified to operate on other computers, especially the Sun Spare4 Workstation. Several two-dimensional grid configurations were completely and successfully developed using EAGLE. Currently, EAGLE is being used for three-dimension grid applications.
Enhancing both motor and cognitive functioning in Parkinson's disease: Aerobic exercise as a rehabilitative intervention.

PubMed

Duchesne, C; Lungu, O; Nadeau, A; Robillard, M E; Boré, A; Bobeuf, F; Lafontaine, A L; Gheysen, F; Bherer, L; Doyon, J

2015-10-01

Aerobic exercise training (AET) has been shown to provide health benefits in individuals with Parkinson's disease (PD). However, it is yet unknown to what extent AET also improves cognitive and procedural learning capacities, which ensure an optimal daily functioning. In the current study, we assessed the effects of a 3-month AET program on executive functions (EF), implicit motor sequence learning (MSL) capacity, as well as on different health-related outcome indicators. Twenty healthy controls (HC) and 19 early PD individuals participated in a supervised, high-intensity, stationary recumbent bike-training program (3 times/week for 12 weeks). Exercise prescription started at 20 min (+5 min/week up to 40 min) based on participant's maximal aerobic power. Before and after AET, EF tests assessed participants' inhibition and flexibility functions, whereas implicit MSL capacity was evaluated using a version of the Serial Reaction Time Task. The AET program was effective as indicated by significant improvement in aerobic capacity in all participants. Most importantly, AET improved inhibition but not flexibility, and motor learning skill, in both groups. Our results suggest that AET can be a valuable non-pharmacological intervention to promote physical fitness in early PD, but also better cognitive and procedural functioning. Copyright © 2015 Elsevier Inc. All rights reserved.
Detailed Aerodynamic Analysis of a Shrouded Tail Rotor Using an Unstructured Mesh Flow Solver

NASA Astrophysics Data System (ADS)

Lee, Hee Dong; Kwon, Oh Joon

The detailed aerodynamics of a shrouded tail rotor in hover has been numerically studied using a parallel inviscid flow solver on unstructured meshes. The numerical method is based on a cell-centered finite-volume discretization and an implicit Gauss-Seidel time integration. The calculation was made for a single blade by imposing a periodic boundary condition between adjacent rotor blades. The grid periodicity was also imposed at the periodic boundary planes to avoid numerical inaccuracy resulting from solution interpolation. The results were compared with available experimental data and those from a disk vortex theory for validation. It was found that realistic three-dimensional modeling is important for the prediction of detailed aerodynamics of shrouded rotors including the tip clearance gap flow.
Monte Carlo Transport for Electron Thermal Transport

NASA Astrophysics Data System (ADS)

Chenhall, Jeffrey; Cao, Duc; Moses, Gregory

2015-11-01

The iSNB (implicit Schurtz Nicolai Busquet multigroup electron thermal transport method of Cao et al. is adapted into a Monte Carlo transport method in order to better model the effects of non-local behavior. The end goal is a hybrid transport-diffusion method that combines Monte Carlo Transport with a discrete diffusion Monte Carlo (DDMC). The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the method will be presented. This work was supported by Sandia National Laboratory - Albuquerque and the University of Rochester Laboratory for Laser Energetics.
Rain scavenging of solid rocket exhaust clouds

NASA Technical Reports Server (NTRS)

Dingle, A. N.

1978-01-01

An explicit model for cloud microphysics was developed for application to the problem of co-condensation/vaporization of HCl and H2O in the presence of Al2O3 particulate nuclei. Validity of the explicit model relative to the implicit model, which has been customarily applied to atmospheric cloud studies, was demonstrated by parallel computations of H2O condensation upon (NH4)2 SO4 nuclei. A mesoscale predictive model designed to account for the impact of wet processes on atmospheric dynamics is also under development. Input data specifying the equilibrium state of HC1 and H2O vapors in contact with aqueous HC1 solutions were found to be limited, particularly in respect to temperature range.
Sampling of Protein Folding Transitions: Multicanonical Versus Replica Exchange Molecular Dynamics.

PubMed

Jiang, Ping; Yaşar, Fatih; Hansmann, Ulrich H E

2013-08-13

We compare the efficiency of multicanonical and replica exchange molecular dynamics for the sampling of folding/unfolding events in simulations of proteins with end-to-end β -sheet. In Go-model simulations of the 75-residue MNK6, we observe improvement factors of 30 in the number of folding/unfolding events of multicanonical molecular dynamics over replica exchange molecular dynamics. As an application, we use this enhanced sampling to study the folding landscape of the 36-residue DS119 with an all-atom physical force field and implicit solvent. Here, we find that the rate-limiting step is the formation of the central helix that then provides a scaffold for the parallel β -sheet formed by the two chain ends.
An analysis of a nonlinear instability in the implementation of a VTOL control system

NASA Technical Reports Server (NTRS)

Weber, J. M.

1982-01-01

The contributions to nonlinear behavior and unstable response of the model following yaw control system of a VTOL aircraft during hover were determined. The system was designed as a state rate feedback implicit model follower that provided yaw rate command/heading hold capability and used combined full authority parallel and limited authority series servo actuators to generate an input to the yaw reaction control system of the aircraft. Both linear and nonlinear system models, as well as describing function linearization techniques were used to determine the influence on the control system instability of input magnitude and bandwidth, series servo authority, and system bandwidth. Results of the analysis describe stability boundaries as a function of these system design characteristics.
Implicit and explicit self-esteem in currently depressed individuals with and without suicidal ideation.

PubMed

Franck, Erik; De Raedt, Rudi; Dereu, Mieke; Van den Abbeele, Dirk

2007-03-01

In the present study, we have further explored implicit self-esteem in currently depressed individuals. Since suicidal ideation is associated with lower self-esteem in depressed individuals, we measured both implicit and explicit self-esteem in a population of currently depressed (CD) individuals, with and without suicidal ideation (SI), and in a group of non-depressed controls (ND). The results indicate that only CD individuals with SI show a discrepancy between their implicit and explicit self-esteem: that is, they exhibit high implicit and low explicit self-esteem. CD individuals without SI exhibit both low implicit and low explicit self-esteem; and ND controls exhibit both normal implicit and normal explicit self-esteem. These results provide new insights in the study of implicit self-esteem and the combination of implicit and explicit self-esteem in depression.
I like myself but I don't know why: enhancing implicit self-esteem by subliminal evaluative conditioning.

PubMed

Dijksterhuis, Ap

2004-02-01

On the basis of a conceptualization of implicit self-esteem as the implicit attitude toward the self, it was predicted that implicit self-esteem could be enhanced by subliminal evaluative conditioning. In 5 experiments, participants were repeatedly presented with trials in which the word I was paired with positive trait terms. Relative to control conditions, this procedure enhanced implicit self-esteem. The effects generalized across 3 measures of implicit self-esteem (Experiments 1-3). Furthermore, evaluative conditioning enhanced implicit self-esteem among people with low-temporal implicit self-esteem and among people with high-temporal implicit self-esteem (Experiment 4). In addition, it was shown that conditioning enhanced self-esteem to such an extent that it made participants insensitive to negative intelligence feedback (Experiments 5a and 5b). Various implications are discussed.
Implicit cognitive aggression among young male prisoners: Association with dispositional and current aggression.

PubMed

Ireland, Jane L; Adams, Christine

2015-01-01

The current study explores associations between implicit and explicit aggression in young adult male prisoners, seeking to apply the Reflection-Impulsive Model and indicate parity with elements of the General Aggression Model and social cognition. Implicit cognitive aggressive processing is not an area that has been examined among prisoners. Two hundred and sixty two prisoners completed an implicit cognitive aggression measure (Puzzle Test) and explicit aggression measures, covering current behaviour (DIPC-R) and aggression disposition (AQ). It was predicted that dispositional aggression would be predicted by implicit cognitive aggression, and that implicit cognitive aggression would predict current engagement in aggressive behaviour. It was also predicted that more impulsive implicit cognitive processing would associate with aggressive behaviour whereas cognitively effortful implicit cognitive processing would not. Implicit aggressive cognitive processing was associated with increased dispositional aggression but not current reports of aggressive behaviour. Impulsive implicit cognitive processing of an aggressive nature predicted increased dispositional aggression whereas more cognitively effortful implicit cognitive aggression did not. The article concludes by outlining the importance of accounting for implicit cognitive processing among prisoners and the need to separate such processing into facets (i.e. impulsive vs. cognitively effortful). Implications for future research and practice in this novel area of study are indicated. Copyright © 2015 Elsevier Ltd. All rights reserved.
A decade of studying implicit racial/ethnic bias in healthcare providers using the implicit association test.

PubMed

Maina, Ivy W; Belton, Tanisha D; Ginzberg, Sara; Singh, Ajit; Johnson, Tiffani J

2018-02-01

Disparities in the care and outcomes of US racial/ethnic minorities are well documented. Research suggests that provider bias plays a role in these disparities. The implicit association test enables measurement of implicit bias via tests of automatic associations between concepts. Hundreds of studies have examined implicit bias in various settings, but relatively few have been conducted in healthcare. The aim of this systematic review is to synthesize the current knowledge on the role of implicit bias in healthcare disparities. A comprehensive literature search of several databases between May 2015 and September 2016 identified 37 qualifying studies. Of these, 31 found evidence of pro-White or light-skin/anti-Black, Hispanic, American Indian or dark-skin bias among a variety of HCPs across multiple levels of training and disciplines. Fourteen studies examined the association between implicit bias and healthcare outcomes using clinical vignettes or simulated patients. Eight found no statistically significant association between implicit bias and patient care while six studies found that higher implicit bias was associated with disparities in treatment recommendations, expectations of therapeutic bonds, pain management, and empathy. All seven studies that examined the impact of implicit provider bias on real-world patient-provider interaction found that providers with stronger implicit bias demonstrated poorer patient-provider communication. Two studies examined the effect of implicit bias on real-world clinical outcomes. One found an association and the other did not. Two studies tested interventions aimed at reducing bias, but only one found a post-intervention reduction in implicit bias. This review reveals a need for more research exploring implicit bias in real-world patient care, potential modifiers and confounders of the effect of implicit bias on care, and strategies aimed at reducing implicit bias and improving patient-provider communication. Future studies have the opportunity to build on this current body of research, and in doing so will enable us to achieve equity in healthcare and outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Unconscious Motivation. Part I: Implicit Attitudes toward L2 Speakers

ERIC Educational Resources Information Center

Al-Hoorie, Ali H.

2016-01-01

This paper reports the first investigation in the second language acquisition field assessing learners' implicit attitudes using the Implicit Association Test, a computerized reaction-time measure. Examination of the explicit and implicit attitudes of Arab learners of English (N = 365) showed that, particularly for males, implicit attitudes toward…

Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Computation of Reacting Flows in Combustion Processes

NASA Technical Reports Server (NTRS)

Keith, Theo G., Jr.; Chen, Kuo-Huey

1997-01-01

The main objective of this research was to develop an efficient three-dimensional computer code for chemically reacting flows. The main computer code developed is ALLSPD-3D. The ALLSPD-3D computer program is developed for the calculation of three-dimensional, chemically reacting flows with sprays. The ALL-SPD code employs a coupled, strongly implicit solution procedure for turbulent spray combustion flows. A stochastic droplet model and an efficient method for treatment of the spray source terms in the gas-phase equations are used to calculate the evaporating liquid sprays. The chemistry treatment in the code is general enough that an arbitrary number of reaction and species can be defined by the users. Also, it is written in generalized curvilinear coordinates with both multi-block and flexible internal blockage capabilities to handle complex geometries. In addition, for general industrial combustion applications, the code provides both dilution and transpiration cooling capabilities. The ALLSPD algorithm, which employs the preconditioning and eigenvalue rescaling techniques, is capable of providing efficient solution for flows with a wide range of Mach numbers. Although written for three-dimensional flows in general, the code can be used for two-dimensional and axisymmetric flow computations as well. The code is written in such a way that it can be run in various computer platforms (supercomputers, workstations and parallel processors) and the GUI (Graphical User Interface) should provide a user-friendly tool in setting up and running the code.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Concurrency-based approaches to parallel programming

NASA Technical Reports Server (NTRS)

Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

1995-01-01

The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Are implicit self-esteem measures valid for assessing individual and cultural differences?

PubMed

Falk, Carl F; Heine, Steven J; Takemura, Kosuke; Zhang, Cathy X J; Hsu, Chih-Wei

2015-02-01

Our research utilized two popular theoretical conceptualizations of implicit self-esteem: 1) implicit self-esteem as a global automatic reaction to the self; and 2) implicit self-esteem as a context/domain specific construct. Under this framework, we present an extensive search for implicit self-esteem measure validity among different cultural groups (Study 1) and under several experimental manipulations (Study 2). In Study 1, Euro-Canadians (N = 107), Asian-Canadians (N = 187), and Japanese (N = 112) completed a battery of implicit self-esteem, explicit self-esteem, and criterion measures. Included implicit self-esteem measures were either popular or provided methodological improvements upon older methods. Criterion measures were sampled from previous research on implicit self-esteem and included self-report and independent ratings. In Study 2, Americans (N = 582) completed a shorter battery of these same types of measures under either a control condition, an explicit prime meant to activate the self-concept in a particular context, or prime meant to activate self-competence related implicit attitudes. Across both studies, explicit self-esteem measures far outperformed implicit self-esteem measures in all cultural groups and under all experimental manipulations. Implicit self-esteem measures are not valid for individual or cross-cultural comparisons. We speculate that individuals may not form implicit associations with the self as an attitudinal object. © 2013 Wiley Periodicals, Inc.
Implicit self-esteem decreases in adolescence: a cross-sectional study.

PubMed

Cai, Huajian; Wu, Mingzheng; Luo, Yu L L; Yang, Jing

2014-01-01

Implicit self-esteem has remained an active research topic in both the areas of implicit social cognition and self-esteem in recent decades. The purpose of this study is to explore the development of implicit self-esteem in adolescents. A total of 599 adolescents from junior and senior high schools in East China participated in the study. They ranged in age from 11 to 18 years with a mean age of 14.10 (SD = 2.16). The degree of implicit self-esteem was assessed using the Implicit Association Test (IAT) with the improved D score as the index. Participants also completed the Rosenberg Self-Esteem Scale (α = 0.77). For all surveyed ages, implicit self-esteem was positively biased, all ts>8.59, all ps<0.001. The simple correlation between implicit self-esteem and age was significant, r = -.25, p = 1. 10(-10). A regression with implicit self-esteem as the criterion variable, and age, gender, and age × gender interaction as predictors further revealed the significant negative linear relationship between age and implicit self-esteem, β = -0.19, t = -3.20, p = 0.001. However, explicit self-esteem manifested a reverse "U" shape throughout adolescence. Implicit self-esteem in adolescence manifests a declining trend with increasing age, suggesting that it is sensitive to developmental or age-related changes. This finding enriches our understanding of the development of implicit social cognition.
Using Implicit Measures to Highlight Science Teachers' Implicit Theories of Intelligence

ERIC Educational Resources Information Center

Mascret, Nicolas; Roussel, Peggy; Cury, François

2015-01-01

Using an innovative method, a Single-Target Implicit Association Test (ST-IAT) was created to explore the implicit theories of intelligence among science and liberal arts teachers and their relationships with their gender. The results showed that for science teachers--especially for male teachers--there was a negative implicit association between…
The Roles of Implicit Understanding of Engineering Ethics in Student Teams' Discussion.

PubMed

Lee, Eun Ah; Grohman, Magdalena; Gans, Nicholas R; Tacca, Marco; Brown, Matthew J

2017-12-01

Following previous work that shows engineering students possess different levels of understanding of ethics-implicit and explicit-this study focuses on how students' implicit understanding of engineering ethics influences their team discussion process, in cases where there is significant divergence between their explicit and implicit understanding. We observed student teams during group discussions of the ethical issues involved in their engineering design projects. Through the micro-scale discourse analysis based on cognitive ethnography, we found two possible ways in which implicit understanding influenced the discussion. In one case, implicit understanding played the role of intuitive ethics-an intuitive judgment followed by reasoning. In the other case, implicit understanding played the role of ethical insight, emotionally guiding the direction of the discussion. In either case, however, implicit understanding did not have a strong influence, and the conclusion of the discussion reflected students' explicit understanding. Because students' implicit understanding represented broader social implication of engineering design in both cases, we suggest to take account of students' relevant implicit understanding in engineering education, to help students become more socially responsible engineers.
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel Volunteer Learning during Youth Programs

ERIC Educational Resources Information Center

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
The Effect of Implicitly Incentivized Faking on Explicit and Implicit Measures of Doping Attitude: When Athletes Want to Pretend an Even More Negative Attitude to Doping

PubMed Central

Wolff, Wanja; Schindler, Sebastian; Brand, Ralf

2015-01-01

The Implicit Association Test (IAT) aims to measure participants’ automatic evaluation of an attitude object and is useful especially for the measurement of attitudes related to socially sensitive subjects, e.g. doping in sports. Several studies indicate that IAT scores can be faked on instruction. But fully or semi-instructed research scenarios might not properly reflect what happens in more realistic situations, when participants secretly decide to try faking the test. The present study is the first to investigate IAT faking when there is only an implicit incentive to do so. Sixty-five athletes (22.83 years ± 2.45; 25 women) were randomly assigned to an incentive-to-fake condition or a control condition. Participants in the incentive-to-fake condition were manipulated to believe that athletes with lenient doping attitudes would be referred to a tedious 45-minute anti-doping program. Attitudes were measured with the pictorial doping brief IAT (BIAT) and with the Performance Enhancement Attitude Scale (PEAS). A one-way MANOVA revealed significant differences between conditions after the manipulation in PEAS scores, but not in the doping BIAT. In the light of our hypothesis this suggests that participants successfully faked an exceedingly negative attitude to doping when completing the PEAS, but were unsuccessful in doing so on the reaction time-based test. This study assessed BIAT faking in a setting that aimed to resemble a situation in which participants want to hide their attempts to cheat. The two measures of attitude were differentially affected by the implicit incentive. Our findings provide evidence that the pictorial doping BIAT is relatively robust against spontaneous and naïve faking attempts. (B)IATs might be less prone to faking than implied by previous studies. PMID:25902142
Assessment of implicit health attitudes: a multitrait-multimethod approach and a comparison between patients with hypochondriasis and patients with anxiety disorders.

PubMed

Weck, Florian; Höfling, Volkmar

2015-01-01

Two adaptations of the Implicit Association Task were used to assess implicit anxiety (IAT-Anxiety) and implicit health attitudes (IAT-Hypochondriasis) in patients with hypochondriasis (n = 58) and anxiety patients (n = 71). Explicit anxieties and health attitudes were assessed using questionnaires. The analysis of several multitrait-multimethod models indicated that the low correlation between explicit and implicit measures of health attitudes is due to the substantial methodological differences between the IAT and the self-report questionnaire. Patients with hypochondriasis displayed significantly more dysfunctional explicit and implicit health attitudes than anxiety patients, but no differences were found regarding explicit and implicit anxieties. The study demonstrates the specificity of explicit and implicit dysfunctional health attitudes among patients with hypochondriasis.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.