parallel adaptive solution: Topics by Science.gov

Sample records for parallel adaptive solution

Dynamic grid refinement for partial differential equations on parallel computers

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations

NASA Technical Reports Server (NTRS)

Chrisochoides, Nikos

1995-01-01

We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1999-01-01

The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Parallel, adaptive finite element methods for conservation laws

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.

1994-01-01

We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Modified Method of Adaptive Artificial Viscosity for Solution of Gas Dynamics Problems on Parallel Computer Systems

NASA Astrophysics Data System (ADS)

Popov, Igor; Sukov, Sergey

2018-02-01

A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration

NASA Astrophysics Data System (ADS)

Zhang, Y.; Key, K.; Ovall, J.; Holst, M.

2014-12-01

We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Parallel adaptive discontinuous Galerkin approximation for thin layer avalanche modeling

NASA Astrophysics Data System (ADS)

Patra, A. K.; Nichita, C. C.; Bauer, A. C.; Pitman, E. B.; Bursik, M.; Sheridan, M. F.

2006-08-01

This paper describes the development of highly accurate adaptive discontinuous Galerkin schemes for the solution of the equations arising from a thin layer type model of debris flows. Such flows have wide applicability in the analysis of avalanches induced by many natural calamities, e.g. volcanoes, earthquakes, etc. These schemes are coupled with special parallel solution methodologies to produce a simulation tool capable of very high-order numerical accuracy. The methodology successfully replicates cold rock avalanches at Mount Rainier, Washington and hot volcanic particulate flows at Colima Volcano, Mexico.
Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

2002-01-01

The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

NASA Technical Reports Server (NTRS)

Lemke, Max; Witsch, Kristian; Quinlan, Daniel

1993-01-01

Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors - Performance results

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
Cartesian Off-Body Grid Adaption for Viscous Time- Accurate Flow Simulation

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2011-01-01

An improved solution adaption capability has been implemented in the OVERFLOW overset grid CFD code. Building on the Cartesian off-body approach inherent in OVERFLOW and the original adaptive refinement method developed by Meakin, the new scheme provides for automated creation of multiple levels of finer Cartesian grids. Refinement can be based on the undivided second-difference of the flow solution variables, or on a specific flow quantity such as vorticity. Coupled with load-balancing and an inmemory solution interpolation procedure, the adaption process provides very good performance for time-accurate simulations on parallel compute platforms. A method of using refined, thin body-fitted grids combined with adaption in the off-body grids is presented, which maximizes the part of the domain subject to adaption. Two- and three-dimensional examples are used to illustrate the effectiveness and performance of the adaption scheme.
Distributed parameter system coupled ARMA expansion identification and adaptive parallel IIR filtering - A unified problem statement. [Auto Regressive Moving-Average

NASA Technical Reports Server (NTRS)

Johnson, C. R., Jr.; Balas, M. J.

1980-01-01

A novel interconnection of distributed parameter system (DPS) identification and adaptive filtering is presented, which culminates in a common statement of coupled autoregressive, moving-average expansion or parallel infinite impulse response configuration adaptive parameterization. The common restricted complexity filter objectives are seen as similar to the reduced-order requirements of the DPS expansion description. The interconnection presents the possibility of an exchange of problem formulations and solution approaches not yet easily addressed in the common finite dimensional lumped-parameter system context. It is concluded that the shared problems raised are nevertheless many and difficult.
Cooperative parallel adaptive neighbourhood search for the disjunctively constrained knapsack problem

NASA Astrophysics Data System (ADS)

Quan, Zhe; Wu, Lei

2017-09-01

This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Adaptive Identification by Systolic Arrays.

DTIC Science & Technology

1987-12-01

BIBLIOGRIAPHY Anton , Howard, Elementary Linear Algebra , John Wiley & Sons, 19S4. Cristi, Roberto, A Parallel Structure Jor Adaptive Pole Placement...10 11. SYSTEM IDENTIFICATION M*YETHODS ....................... 12 A. LINEAR SYSTEM MODELING ......................... 12 B. SOLUTION OF SYSTEMS OF... LINEAR EQUATIONS ......... 13 C. QR DECOMPOSITION ................................ 14 D. RECURSIVE LEAST SQUARES ......................... 16 E. BLOCK
Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

NASA Technical Reports Server (NTRS)

Norton, C.; Balsara, D.

1999-01-01

Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.
A parallel adaptive mesh refinement algorithm

NASA Technical Reports Server (NTRS)

Quirk, James J.; Hanebutte, Ulf R.

1993-01-01

Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.

Unstructured mesh algorithms for aerodynamic calculations

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.

1992-01-01

The use of unstructured mesh techniques for solving complex aerodynamic flows is discussed. The principle advantages of unstructured mesh strategies, as they relate to complex geometries, adaptive meshing capabilities, and parallel processing are emphasized. The various aspects required for the efficient and accurate solution of aerodynamic flows are addressed. These include mesh generation, mesh adaptivity, solution algorithms, convergence acceleration, and turbulence modeling. Computations of viscous turbulent two-dimensional flows and inviscid three-dimensional flows about complex configurations are demonstrated. Remaining obstacles and directions for future research are also outlined.
Adaptive implicit-explicit and parallel element-by-element iteration schemes

NASA Technical Reports Server (NTRS)

Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.

1989-01-01

Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2

NASA Technical Reports Server (NTRS)

Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
Applying Parallel Adaptive Methods with GeoFEST/PYRAMID to Simulate Earth Surface Crustal Dynamics

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Glasscoe, Margaret; Donnellan, Andrea; Li, Peggy

2006-01-01

This viewgraph presentation reviews the use Adaptive Mesh Refinement (AMR) in simulating the Crustal Dynamics of Earth's Surface. AMR simultaneously improves solution quality, time to solution, and computer memory requirements when compared to generating/running on a globally fine mesh. The use of AMR in simulating the dynamics of the Earth's Surface is spurred by future proposed NASA missions, such as InSAR for Earth surface deformation and other measurements. These missions will require support for large-scale adaptive numerical methods using AMR to model observations. AMR was chosen because it has been successful in computation fluid dynamics for predictive simulation of complex flows around complex structures.
Approximation algorithms for scheduling unrelated parallel machines with release dates

NASA Astrophysics Data System (ADS)

Avdeenko, T. V.; Mesentsev, Y. A.; Estraykh, I. V.

2017-01-01

In this paper we propose approaches to optimal scheduling of unrelated parallel machines with release dates. One approach is based on the scheme of dynamic programming modified with adaptive narrowing of search domain ensuring its computational effectiveness. We discussed complexity of the exact schedules synthesis and compared it with approximate, close to optimal, solutions. Also we explain how the algorithm works for the example of two unrelated parallel machines and five jobs with release dates. Performance results that show the efficiency of the proposed approach have been given.
PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.

NASA Technical Reports Server (NTRS)

Oliker, Leonid

1998-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
The propagation of the shock wave from a strong explosion in a plane-parallel stratified medium: the Kompaneets approximation

NASA Astrophysics Data System (ADS)

Olano, C. A.

2009-11-01

Context: Using certain simplifications, Kompaneets derived a partial differential equation that states the local geometrical and kinematical conditions that each surface element of a shock wave, created by a point blast in a stratified gaseous medium, must satisfy. Kompaneets could solve his equation analytically for the case of a wave propagating in an exponentially stratified medium, obtaining the form of the shock front at progressive evolutionary stages. Complete analytical solutions of the Kompaneets equation for shock wave motion in further plane-parallel stratified media were not found, except for radially stratified media. Aims: We aim to analytically solve the Kompaneets equation for the motion of a shock wave in different plane-parallel stratified media that can reflect a wide variety of astrophysical contexts. We were particularly interested in solving the Kompaneets equation for a strong explosion in the interstellar medium of the Galactic disk, in which, due to intense winds and explosions of stars, gigantic gaseous structures known as superbubbles and supershells are formed. Methods: Using the Kompaneets approximation, we derived a pair of equations that we call adapted Kompaneets equations, that govern the propagation of a shock wave in a stratified medium and that permit us to obtain solutions in parametric form. The solutions provided by the system of adapted Kompaneets equations are equivalent to those of the Kompaneets equation. We solved the adapted Kompaneets equations for shock wave propagation in a generic stratified medium by means of a power-series method. Results: Using the series solution for a shock wave in a generic medium, we obtained the series solutions for four specific media whose respective density distributions in the direction perpendicular to the stratification plane are of an exponential, power-law type (one with exponent k=-1 and the other with k =-2) and a quadratic hyperbolic-secant. From these series solutions, we deduced exact solutions for the four media in terms of elemental functions. The exact solution for shock wave propagation in a medium of quadratic hyperbolic-secant density distribution is very appropriate to describe the growth of superbubbles in the Galactic disk. Member of the Carrera del Investigador Científico del CONICET, Argentina.
Near-Body Grid Adaption for Overset Grids

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2016-01-01

A solution adaption capability for curvilinear near-body grids has been implemented in the OVERFLOW overset grid computational fluid dynamics code. The approach follows closely that used for the Cartesian off-body grids, but inserts refined grids in the computational space of original near-body grids. Refined curvilinear grids are generated using parametric cubic interpolation, with one-sided biasing based on curvature and stretching ratio of the original grid. Sensor functions, grid marking, and solution interpolation tasks are implemented in the same fashion as for off-body grids. A goal-oriented procedure, based on largest error first, is included for controlling growth rate and maximum size of the adapted grid system. The adaption process is almost entirely parallelized using MPI, resulting in a capability suitable for viscous, moving body simulations. Two- and three-dimensional examples are presented.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes

NASA Technical Reports Server (NTRS)

Wissink, Andrew; Allen, Edwin (Technical Monitor)

1998-01-01

Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.

Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Implicit schemes and parallel computing in unstructured grid CFD

NASA Technical Reports Server (NTRS)

Venkatakrishnam, V.

1995-01-01

The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE PAGES

Grayver, Alexander V.; Kolev, Tzanio V.

2015-11-01

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grayver, Alexander V.; Kolev, Tzanio V.

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Many-to-one form-to-function mapping weakens parallel morphological evolution.

PubMed

Thompson, Cole J; Ahmed, Newaz I; Veen, Thor; Peichel, Catherine L; Hendry, Andrew P; Bolnick, Daniel I; Stuart, Yoel E

2017-11-01

Evolutionary ecologists aim to explain and predict evolutionary change under different selective regimes. Theory suggests that such evolutionary prediction should be more difficult for biomechanical systems in which different trait combinations generate the same functional output: "many-to-one mapping." Many-to-one mapping of phenotype to function enables multiple morphological solutions to meet the same adaptive challenges. Therefore, many-to-one mapping should undermine parallel morphological evolution, and hence evolutionary predictability, even when selection pressures are shared among populations. Studying 16 replicate pairs of lake- and stream-adapted threespine stickleback (Gasterosteus aculeatus), we quantified three parts of the teleost feeding apparatus and used biomechanical models to calculate their expected functional outputs. The three feeding structures differed in their form-to-function relationship from one-to-one (lower jaw lever ratio) to increasingly many-to-one (buccal suction index, opercular 4-bar linkage). We tested for (1) weaker linear correlations between phenotype and calculated function, and (2) less parallel evolution across lake-stream pairs, in the many-to-one systems relative to the one-to-one system. We confirm both predictions, thus supporting the theoretical expectation that increasing many-to-one mapping undermines parallel evolution. Therefore, sole consideration of morphological variation within and among populations might not serve as a proxy for functional variation when multiple adaptive trait combinations exist. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS

PubMed Central

Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.

2014-01-01

Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
Application of Parallel Adjoint-Based Error Estimation and Anisotropic Grid Adaptation for Three-Dimensional Aerospace Configurations

NASA Technical Reports Server (NTRS)

Lee-Rausch, E. M.; Park, M. A.; Jones, W. T.; Hammond, D. P.; Nielsen, E. J.

2005-01-01

This paper demonstrates the extension of error estimation and adaptation methods to parallel computations enabling larger, more realistic aerospace applications and the quantification of discretization errors for complex 3-D solutions. Results were shown for an inviscid sonic-boom prediction about a double-cone configuration and a wing/body segmented leading edge (SLE) configuration where the output function of the adjoint was pressure integrated over a part of the cylinder in the near field. After multiple cycles of error estimation and surface/field adaptation, a significant improvement in the inviscid solution for the sonic boom signature of the double cone was observed. Although the double-cone adaptation was initiated from a very coarse mesh, the near-field pressure signature from the final adapted mesh compared very well with the wind-tunnel data which illustrates that the adjoint-based error estimation and adaptation process requires no a priori refinement of the mesh. Similarly, the near-field pressure signature for the SLE wing/body sonic boom configuration showed a significant improvement from the initial coarse mesh to the final adapted mesh in comparison with the wind tunnel results. Error estimation and field adaptation results were also presented for the viscous transonic drag prediction of the DLR-F6 wing/body configuration, and results were compared to a series of globally refined meshes. Two of these globally refined meshes were used as a starting point for the error estimation and field-adaptation process where the output function for the adjoint was the total drag. The field-adapted results showed an improvement in the prediction of the drag in comparison with the finest globally refined mesh and a reduction in the estimate of the remaining drag error. The adjoint-based adaptation parameter showed a need for increased resolution in the surface of the wing/body as well as a need for wake resolution downstream of the fuselage and wing trailing edge in order to achieve the requested drag tolerance. Although further adaptation was required to meet the requested tolerance, no further cycles were computed in order to avoid large discrepancies between the surface mesh spacing and the refined field spacing.
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh

NASA Astrophysics Data System (ADS)

Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru

2017-11-01

We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
Adaptive multiple super fast simulated annealing for stochastic microstructure reconstruction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ryu, Seun; Lin, Guang; Sun, Xin

2013-01-01

Fast image reconstruction from statistical information is critical in image fusion from multimodality chemical imaging instrumentation to create high resolution image with large domain. Stochastic methods have been used widely in image reconstruction from two point correlation function. The main challenge is to increase the efficiency of reconstruction. A novel simulated annealing method is proposed for fast solution of image reconstruction. Combining the advantage of very fast cooling schedules, dynamic adaption and parallelization, the new simulation annealing algorithm increases the efficiencies by several orders of magnitude, making the large domain image fusion feasible.
DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Multiscale Simulations of Magnetic Island Coalescence

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2010-01-01

We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.

NASA Astrophysics Data System (ADS)

Liu, Y.; Li, Y.

2016-12-01

We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
ADAPTIVE TETRAHEDRAL GRID REFINEMENT AND COARSENING IN MESSAGE-PASSING ENVIRONMENTS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hallberg, J.; Stagg, A.

2000-10-01

A grid refinement and coarsening scheme has been developed for tetrahedral and triangular grid-based calculations in message-passing environments. The element adaption scheme is based on an edge bisection of elements marked for refinement by an appropriate error indicator. Hash-table/linked-list data structures are used to store nodal and element formation. The grid along inter-processor boundaries is refined and coarsened consistently with the update of these data structures via MPI calls. The parallel adaption scheme has been applied to the solution of a transient, three-dimensional, nonlinear, groundwater flow problem. Timings indicate efficiency of the grid refinement process relative to the flow solvermore » calculations.« less
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

PubMed

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram

This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Parallel, Gradient-Based Anisotropic Mesh Adaptation for Re-entry Vehicle Configurations

NASA Technical Reports Server (NTRS)

Bibb, Karen L.; Gnoffo, Peter A.; Park, Michael A.; Jones, William T.

2006-01-01

Two gradient-based adaptation methodologies have been implemented into the Fun3d refine GridEx infrastructure. A spring-analogy adaptation which provides for nodal movement to cluster mesh nodes in the vicinity of strong shocks has been extended for general use within Fun3d, and is demonstrated for a 70 sphere cone at Mach 2. A more general feature-based adaptation metric has been developed for use with the adaptation mechanics available in Fun3d, and is applicable to any unstructured, tetrahedral, flow solver. The basic functionality of general adaptation is explored through a case of flow over the forebody of a 70 sphere cone at Mach 6. A practical application of Mach 10 flow over an Apollo capsule, computed with the Felisa flow solver, is given to compare the adaptive mesh refinement with uniform mesh refinement. The examples of the paper demonstrate that the gradient-based adaptation capability as implemented can give an improvement in solution quality.

A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Parallel Adaptive Simulation of Detonation Waves Using a Weighted Essentially Non-Oscillatory Scheme

NASA Astrophysics Data System (ADS)

McMahon, Sean

The purpose of this thesis was to develop a code that could be used to develop a better understanding of the physics of detonation waves. First, a detonation was simulated in one dimension using ZND theory. Then, using the 1D solution as an initial condition, a detonation was simulated in two dimensions using a weighted essentially non-oscillatory scheme on an adaptive mesh with the smallest lengthscales being equal to 2-3 flamelet lengths. The code development in linking Chemkin for chemical kinetics to the adaptive mesh refinement flow solver was completed. The detonation evolved in a way that, qualitatively, matched the experimental observations, however, the simulation was unable to progress past the formation of the triple point.
A POSTERIORI ERROR ANALYSIS OF TWO STAGE COMPUTATION METHODS WITH APPLICATION TO EFFICIENT DISCRETIZATION AND THE PARAREAL ALGORITHM.

PubMed

Chaudhry, Jehanzeb Hameed; Estep, Don; Tavener, Simon; Carey, Varis; Sandelin, Jeff

2016-01-01

We consider numerical methods for initial value problems that employ a two stage approach consisting of solution on a relatively coarse discretization followed by solution on a relatively fine discretization. Examples include adaptive error control, parallel-in-time solution schemes, and efficient solution of adjoint problems for computing a posteriori error estimates. We describe a general formulation of two stage computations then perform a general a posteriori error analysis based on computable residuals and solution of an adjoint problem. The analysis accommodates various variations in the two stage computation and in formulation of the adjoint problems. We apply the analysis to compute "dual-weighted" a posteriori error estimates, to develop novel algorithms for efficient solution that take into account cancellation of error, and to the Parareal Algorithm. We test the various results using several numerical examples.
Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

2013-04-01

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
The Center for Multiscale Plasma Dynamics, Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gombosi, Tamas I.

The University of Michigan participated in the joint UCLA/Maryland fusion science center focused on plasma physics problems for which the traditional separation of the dynamics into microscale and macroscale processes breaks down. These processes involve large scale flows and magnetic fields tightly coupled to the small scale, kinetic dynamics of turbulence, particle acceleration and energy cascade. The interaction between these vastly disparate scales controls the evolution of the system. The enormous range of temporal and spatial scales associated with these problems renders direct simulation intractable even in computations that use the largest existing parallel computers. Our efforts focused on twomore » main problems: the development of Hall MHD solvers on solution adaptive grids and the development of solution adaptive grids using generalized coordinates so that the proper geometry of inertial confinement can be taken into account and efficient refinement strategies can be obtained.« less
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks.

PubMed

Mulder, Samuel A; Wunsch, Donald C

2003-01-01

The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.
Evolution Is an Experiment: Assessing Parallelism in Crop Domestication and Experimental Evolution: (Nei Lecture, SMBE 2014, Puerto Rico).

PubMed

Gaut, Brandon S

2015-07-01

In this commentary, I make inferences about the level of repeatability and constraint in the evolutionary process, based on two sets of replicated experiments. The first experiment is crop domestication, which has been replicated across many different species. I focus on results of whole-genome scans for genes selected during domestication and ask whether genes are, in fact, selected in parallel across different domestication events. If genes are selected in parallel, it implies that the number of genetic solutions to the challenge of domestication is constrained. However, I find no evidence for parallel selection events either between species (maize vs. rice) or within species (two domestication events within beans). These results suggest that there are few constraints on genetic adaptation, but conclusions must be tempered by several complicating factors, particularly the lack of explicit design standards for selection screens. The second experiment involves the evolution of Escherichia coli to thermal stress. Unlike domestication, this highly replicated experiment detected a limited set of genes that appear prone to modification during adaptation to thermal stress. However, the number of potentially beneficial mutations within these genes is large, such that adaptation is constrained at the genic level but much less so at the nucleotide level. Based on these two experiments, I make the general conclusion that evolution is remarkably flexible, despite the presence of epistatic interactions that constrain evolutionary trajectories. I also posit that evolution is so rapid that we should establish a Speciation Prize, to be awarded to the first researcher who demonstrates speciation with a sexual organism in the laboratory. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Adaptive parallel logic networks

NASA Technical Reports Server (NTRS)

Martinez, Tony R.; Vidal, Jacques J.

1988-01-01

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit

NASA Technical Reports Server (NTRS)

MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles

1999-01-01

In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.
Parallel adaptive wavelet collocation method for PDEs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

2015-10-01

A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)

1999-01-01

This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
A wavelet approach to binary blackholes with asynchronous multitasking

NASA Astrophysics Data System (ADS)

Lim, Hyun; Hirschmann, Eric; Neilsen, David; Anderson, Matthew; Debuhr, Jackson; Zhang, Bo

2016-03-01

Highly accurate simulations of binary black holes and neutron stars are needed to address a variety of interesting problems in relativistic astrophysics. We present a new method for the solving the Einstein equations (BSSN formulation) using iterated interpolating wavelets. Wavelet coefficients provide a direct measure of the local approximation error for the solution and place collocation points that naturally adapt to features of the solution. Further, they exhibit exponential convergence on unevenly spaced collection points. The parallel implementation of the wavelet simulation framework presented here deviates from conventional practice in combining multi-threading with a form of message-driven computation sometimes referred to as asynchronous multitasking.
Modeling RF Fields in Hot Plasmas with Parallel Full Wave Code

NASA Astrophysics Data System (ADS)

Spencer, Andrew; Svidzinski, Vladimir; Zhao, Liangji; Galkin, Sergei; Kim, Jin-Soo

2016-10-01

FAR-TECH, Inc. is developing a suite of full wave RF plasma codes. It is based on a meshless formulation in configuration space with adapted cloud of computational points (CCP) capability and using the hot plasma conductivity kernel to model the nonlocal plasma dielectric response. The conductivity kernel is calculated by numerically integrating the linearized Vlasov equation along unperturbed particle trajectories. Work has been done on the following calculations: 1) the conductivity kernel in hot plasmas, 2) a monitor function based on analytic solutions of the cold-plasma dispersion relation, 3) an adaptive CCP based on the monitor function, 4) stencils to approximate the wave equations on the CCP, 5) the solution to the full wave equations in the cold-plasma model in tokamak geometry for ECRH and ICRH range of frequencies, and 6) the solution to the wave equations using the calculated hot plasma conductivity kernel. We will present results on using a meshless formulation on adaptive CCP to solve the wave equations and on implementing the non-local hot plasma dielectric response to the wave equations. The presentation will include numerical results of wave propagation and absorption in the cold and hot tokamak plasma RF models, using DIII-D geometry and plasma parameters. Work is supported by the U.S. DOE SBIR program.
Solution- and solid-phase parallel synthesis of 4-alkoxy-substituted pyrimidines with high molecular diversity.

PubMed

Font, David; Heras, Montserrat; Villalgordo, José M

2003-01-01

A simple and straightforward methodology toward the synthesis of novel 2,6-disubstituted-4-alkoxypyrimidine derivatives of type 16 and 19 has been developed. This methodology, initially developed in solution, can be perfectly adapted to the solid support under analogous conditions, taking full advantage of automated parallel synthesis systems. This successful methodology benefits from the key role played by the thioether linkage placed at the 2-position in 3, 9, or 13 in a double manner: on one side, the steric effect exerted by the thioether linkage is likely to be responsible for the very high observed selectivity toward the formation of the O-alkylation products. On the other side, this sulfur linkage can serve not only as a robust point of attachment for the heterocycle, stable to a number of reaction conditions, but also as a means of introducing a new element of diversity through activation to the corresponding sulfone (safety-catch linker concept) and subsequent ipso-substitution reaction with a variety of different N-nucleophiles.
Reconfigurable Model Execution in the OpenMDAO Framework

NASA Technical Reports Server (NTRS)

Hwang, John T.

2017-01-01

NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.
Modelling Detailed-Chemistry Effects on Turbulent Diffusion Flames using a Parallel Solution-Adaptive Scheme

NASA Astrophysics Data System (ADS)

Jha, Pradeep Kumar

Capturing the effects of detailed-chemistry on turbulent combustion processes is a central challenge faced by the numerical combustion community. However, the inherent complexity and non-linear nature of both turbulence and chemistry require that combustion models rely heavily on engineering approximations to remain computationally tractable. This thesis proposes a computationally efficient algorithm for modelling detailed-chemistry effects in turbulent diffusion flames and numerically predicting the associated flame properties. The cornerstone of this combustion modelling tool is the use of parallel Adaptive Mesh Refinement (AMR) scheme with the recently proposed Flame Prolongation of Intrinsic low-dimensional manifold (FPI) tabulated-chemistry approach for modelling complex chemistry. The effect of turbulence on the mean chemistry is incorporated using a Presumed Conditional Moment (PCM) approach based on a beta-probability density function (PDF). The two-equation k-w turbulence model is used for modelling the effects of the unresolved turbulence on the mean flow field. The finite-rate of methane-air combustion is represented here by using the GRI-Mech 3.0 scheme. This detailed mechanism is used to build the FPI tables. A state of the art numerical scheme based on a parallel block-based solution-adaptive algorithm has been developed to solve the Favre-averaged Navier-Stokes (FANS) and other governing partial-differential equations using a second-order accurate, fully-coupled finite-volume formulation on body-fitted, multi-block, quadrilateral/hexahedral mesh for two-dimensional and three-dimensional flow geometries, respectively. A standard fourth-order Runge-Kutta time-marching scheme is used for time-accurate temporal discretizations. Numerical predictions of three different diffusion flames configurations are considered in the present work: a laminar counter-flow flame; a laminar co-flow diffusion flame; and a Sydney bluff-body turbulent reacting flow. Comparisons are made between the predicted results of the present FPI scheme and Steady Laminar Flamelet Model (SLFM) approach for diffusion flames. The effects of grid resolution on the predicted overall flame solutions are also assessed. Other non-reacting flows have also been considered to further validate other aspects of the numerical scheme. The present schemes predict results which are in good agreement with published experimental results and reduces the computational cost involved in modelling turbulent diffusion flames significantly, both in terms of storage and processing time.
Lattice Boltzmann model for simulation of magnetohydrodynamics

NASA Technical Reports Server (NTRS)

Chen, Shiyi; Chen, Hudong; Martinez, Daniel; Matthaeus, William

1991-01-01

A numerical method, based on a discrete Boltzmann equation, is presented for solving the equations of magnetohydrodynamics (MHD). The algorithm provides advantages similar to the cellular automaton method in that it is local and easily adapted to parallel computing environments. Because of much lower noise levels and less stringent requirements on lattice size, the method appears to be more competitive with traditional solution methods. Examples show that the model accurately reproduces both linear and nonlinear MHD phenomena.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
Genomics of parallel adaptation at two timescales in Drosophila

PubMed Central

Begun, David J.

2017-01-01

Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus. PMID:28968391
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.

To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less

An Airway Network Flow Assignment Approach Based on an Efficient Multiobjective Optimization Framework

PubMed Central

Zhang, Xuejun; Lei, Jiaxing

2015-01-01

Considering reducing the airspace congestion and the flight delay simultaneously, this paper formulates the airway network flow assignment (ANFA) problem as a multiobjective optimization model and presents a new multiobjective optimization framework to solve it. Firstly, an effective multi-island parallel evolution algorithm with multiple evolution populations is employed to improve the optimization capability. Secondly, the nondominated sorting genetic algorithm II is applied for each population. In addition, a cooperative coevolution algorithm is adapted to divide the ANFA problem into several low-dimensional biobjective optimization problems which are easier to deal with. Finally, in order to maintain the diversity of solutions and to avoid prematurity, a dynamic adjustment operator based on solution congestion degree is specifically designed for the ANFA problem. Simulation results using the real traffic data from China air route network and daily flight plans demonstrate that the proposed approach can improve the solution quality effectively, showing superiority to the existing approaches such as the multiobjective genetic algorithm, the well-known multiobjective evolutionary algorithm based on decomposition, and a cooperative coevolution multiobjective algorithm as well as other parallel evolution algorithms with different migration topology. PMID:26180840
Fast Numerical Solution of the Plasma Response Matrix for Real-time Ideal MHD Control

DOE PAGES

Glasser, Alexander; Kolemen, Egemen; Glasser, Alan H.

2018-03-26

To help effectuate near real-time feedback control of ideal MHD instabilities in tokamak geometries, a parallelized version of A.H. Glasser’s DCON (Direct Criterion of Newcomb) code is developed. To motivate the numerical implementation, we first solve DCON’s δW formulation with a Hamilton-Jacobi theory, elucidating analytical and numerical features of the ideal MHD stability problem. The plasma response matrix is demonstrated to be the solution of an ideal MHD Riccati equation. We then describe our adaptation of DCON with numerical methods natural to solutions of the Riccati equation, parallelizing it to enable its operation in near real-time. We replace DCON’s serial integration of perturbed modes—which satisfy a singular Euler- Lagrange equation—with a domain-decomposed integration of state transition matrices. Output is shown to match results from DCON with high accuracy, and with computation time < 1s. Such computational speed may enable active feedback ideal MHD stability control, especially in plasmas whose ideal MHD equilibria evolve with inductive timescalemore » $$\\tau$$ ≳ 1s—as in ITER. Further potential applications of this theory are discussed.« less
FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

PubMed

Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

2018-06-08

Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
Eulerian Lagrangian Adaptive Fup Collocation Method for solving the conservative solute transport in heterogeneous porous media

NASA Astrophysics Data System (ADS)

Gotovac, Hrvoje; Srzic, Veljko

2014-05-01

Contaminant transport in natural aquifers is a complex, multiscale process that is frequently studied using different Eulerian, Lagrangian and hybrid numerical methods. Conservative solute transport is typically modeled using the advection-dispersion equation (ADE). Despite the large number of available numerical methods that have been developed to solve it, the accurate numerical solution of the ADE still presents formidable challenges. In particular, current numerical solutions of multidimensional advection-dominated transport in non-uniform velocity fields are affected by one or all of the following problems: numerical dispersion that introduces artificial mixing and dilution, grid orientation effects, unresolved spatial and temporal scales and unphysical numerical oscillations (e.g., Herrera et al, 2009; Bosso et al., 2012). In this work we will present Eulerian Lagrangian Adaptive Fup Collocation Method (ELAFCM) based on Fup basis functions and collocation approach for spatial approximation and explicit stabilized Runge-Kutta-Chebyshev temporal integration (public domain routine SERK2) which is especially well suited for stiff parabolic problems. Spatial adaptive strategy is based on Fup basis functions which are closely related to the wavelets and splines so that they are also compactly supported basis functions; they exactly describe algebraic polynomials and enable a multiresolution adaptive analysis (MRA). MRA is here performed via Fup Collocation Transform (FCT) so that at each time step concentration solution is decomposed using only a few significant Fup basis functions on adaptive collocation grid with appropriate scales (frequencies) and locations, a desired level of accuracy and a near minimum computational cost. FCT adds more collocations points and higher resolution levels only in sensitive zones with sharp concentration gradients, fronts and/or narrow transition zones. According to the our recent achievements there is no need for solving the large linear system on adaptive grid because each Fup coefficient is obtained by predefined formulas equalizing Fup expansion around corresponding collocation point and particular collocation operator based on few surrounding solution values. Furthermore, each Fup coefficient can be obtained independently which is perfectly suited for parallel processing. Adaptive grid in each time step is obtained from solution of the last time step or initial conditions and advective Lagrangian step in the current time step according to the velocity field and continuous streamlines. On the other side, we implement explicit stabilized routine SERK2 for dispersive Eulerian part of solution in the current time step on obtained spatial adaptive grid. Overall adaptive concept does not require the solving of large linear systems for the spatial and temporal approximation of conservative transport. Also, this new Eulerian-Lagrangian-Collocation scheme resolves all mentioned numerical problems due to its adaptive nature and ability to control numerical errors in space and time. Proposed method solves advection in Lagrangian way eliminating problems in Eulerian methods, while optimal collocation grid efficiently describes solution and boundary conditions eliminating usage of large number of particles and other problems in Lagrangian methods. Finally, numerical tests show that this approach enables not only accurate velocity field, but also conservative transport even in highly heterogeneous porous media resolving all spatial and temporal scales of concentration field.
Balancing exploration, uncertainty and computational demands in many objective reservoir optimization

NASA Astrophysics Data System (ADS)

Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea

2017-11-01

Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks.

PubMed

Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph

2015-08-01

Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is "non-intrusive" and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design.
Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks

PubMed Central

Schillings, Claudia; Sunnåker, Mikael; Stelling, Jörg; Schwab, Christoph

2015-01-01

Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is “non-intrusive” and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design. PMID:26317784
Fast adaptive composite grid methods on distributed parallel architectures

NASA Technical Reports Server (NTRS)

Lemke, Max; Quinlan, Daniel

1992-01-01

The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Use of parallel computing in mass processing of laser data

NASA Astrophysics Data System (ADS)

Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

2015-12-01

The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Multiple-try differential evolution adaptive Metropolis for efficient solution of highly parameterized models

NASA Astrophysics Data System (ADS)

Eric, L.; Vrugt, J. A.

2010-12-01

Spatially distributed hydrologic models potentially contain hundreds of parameters that need to be derived by calibration against a historical record of input-output data. The quality of this calibration strongly determines the predictive capability of the model and thus its usefulness for science-based decision making and forecasting. Unfortunately, high-dimensional optimization problems are typically difficult to solve. Here we present our recent developments to the Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009) to warrant efficient solution of high-dimensional parameter estimation problems. The algorithm samples from an archive of past states (Ter Braak and Vrugt, 2008), and uses multiple-try Metropolis sampling (Liu et al., 2000) to decrease the required burn-in time for each individual chain and increase efficiency of posterior sampling. This approach is hereafter referred to as MT-DREAM. We present results for 2 synthetic mathematical case studies, and 2 real-world examples involving from 10 to 240 parameters. Results for those cases show that our multiple-try sampler, MT-DREAM, can consistently find better solutions than other Bayesian MCMC methods. Moreover, MT-DREAM is admirably suited to be implemented and ran on a parallel machine and is therefore a powerful method for posterior inference.
Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

NASA Astrophysics Data System (ADS)

Schwing, Alan Michael

For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable comparisons across a range of regimes. Unsteady and steady applications are considered in both subsonic and supersonic flows. Inviscid and viscous simulations achieve similar results at a much reduced cost when employing dynamic mesh adaptation. Several techniques for guiding adaptation are compared. Detailed analysis of statistics from the instrumented solver enable understanding of the costs associated with adaptation. Adaptive mesh refinement shows promise for the test cases presented here. It can be considerably faster than using conventional grids and provides accurate results. The procedures for adapting the grid are light-weight enough to not require significant computational time and yield significant reductions in grid size.
The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Parallel Anisotropic Tetrahedral Adaptation

NASA Technical Reports Server (NTRS)

Park, Michael A.; Darmofal, David L.

2008-01-01

An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Parallel and Divergent Evolutionary Solutions for the Optimization of an Engineered Central Metabolism in Methylobacterium extorquens AM1

PubMed Central

Carroll, Sean Michael; Chubiz, Lon M.; Agashe, Deepa; Marx, Christopher J.

2015-01-01

Bioengineering holds great promise to provide fast and efficient biocatalysts for methanol-based biotechnology, but necessitates proven methods to optimize physiology in engineered strains. Here, we highlight experimental evolution as an effective means for optimizing an engineered Methylobacterium extorquens AM1. Replacement of the native formaldehyde oxidation pathway with a functional analog substantially decreased growth in an engineered Methylobacterium, but growth rapidly recovered after six hundred generations of evolution on methanol. We used whole-genome sequencing to identify the basis of adaptation in eight replicate evolved strains, and examined genomic changes in light of other growth and physiological data. We observed great variety in the numbers and types of mutations that occurred, including instances of parallel mutations at targets that may have been “rationalized” by the bioengineer, plus other “illogical” mutations that demonstrate the ability of evolution to expose unforeseen optimization solutions. Notably, we investigated mutations to RNA polymerase, which provided a massive growth benefit but are linked to highly aberrant transcriptional profiles. Overall, we highlight the power of experimental evolution to present genetic and physiological solutions for strain optimization, particularly in systems where the challenges of engineering are too many or too difficult to overcome via traditional engineering methods. PMID:27682084
Balancing Exploration, Uncertainty Representation and Computational Time in Many-Objective Reservoir Policy Optimization

NASA Astrophysics Data System (ADS)

Zatarain-Salazar, J.; Reed, P. M.; Quinn, J.; Giuliani, M.; Castelletti, A.

2016-12-01

As we confront the challenges of managing river basin systems with a large number of reservoirs and increasingly uncertain tradeoffs impacting their operations (due to, e.g. climate change, changing energy markets, population pressures, ecosystem services, etc.), evolutionary many-objective direct policy search (EMODPS) solution strategies will need to address the computational demands associated with simulating more uncertainties and therefore optimizing over increasingly noisy objective evaluations. Diagnostic assessments of state-of-the-art many-objective evolutionary algorithms (MOEAs) to support EMODPS have highlighted that search time (or number of function evaluations) and auto-adaptive search are key features for successful optimization. Furthermore, auto-adaptive MOEA search operators are themselves sensitive to having a sufficient number of function evaluations to learn successful strategies for exploring complex spaces and for escaping from local optima when stagnation is detected. Fortunately, recent parallel developments allow coordinated runs that enhance auto-adaptive algorithmic learning and can handle scalable and reliable search with limited wall-clock time, but at the expense of the total number of function evaluations. In this study, we analyze this tradeoff between parallel coordination and depth of search using different parallelization schemes of the Multi-Master Borg on a many-objective stochastic control problem. We also consider the tradeoff between better representing uncertainty in the stochastic optimization, and simplifying this representation to shorten the function evaluation time and allow for greater search. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple competing objectives for hydropower production, urban water supply, recreation and environmental flows need to be balanced. Our results provide guidance for balancing exploration, uncertainty, and computational demands when using the EMODPS framework to discover key tradeoffs within the LSRB system.
Adaptive Mesh Refinement for Microelectronic Device Design

NASA Technical Reports Server (NTRS)

Cwik, Tom; Lou, John; Norton, Charles

1999-01-01

Finite element and finite volume methods are used in a variety of design simulations when it is necessary to compute fields throughout regions that contain varying materials or geometry. Convergence of the simulation can be assessed by uniformly increasing the mesh density until an observable quantity stabilizes. Depending on the electrical size of the problem, uniform refinement of the mesh may be computationally infeasible due to memory limitations. Similarly, depending on the geometric complexity of the object being modeled, uniform refinement can be inefficient since regions that do not need refinement add to the computational expense. In either case, convergence to the correct (measured) solution is not guaranteed. Adaptive mesh refinement methods attempt to selectively refine the region of the mesh that is estimated to contain proportionally higher solution errors. The refinement may be obtained by decreasing the element size (h-refinement), by increasing the order of the element (p-refinement) or by a combination of the two (h-p refinement). A successful adaptive strategy refines the mesh to produce an accurate solution measured against the correct fields without undue computational expense. This is accomplished by the use of a) reliable a posteriori error estimates, b) hierarchal elements, and c) automatic adaptive mesh generation. Adaptive methods are also useful when problems with multi-scale field variations are encountered. These occur in active electronic devices that have thin doped layers and also when mixed physics is used in the calculation. The mesh needs to be fine at and near the thin layer to capture rapid field or charge variations, but can coarsen away from these layers where field variations smoothen and charge densities are uniform. This poster will present an adaptive mesh refinement package that runs on parallel computers and is applied to specific microelectronic device simulations. Passive sensors that operate in the infrared portion of the spectrum as well as active device simulations that model charge transport and Maxwell's equations will be presented.
The effect of selection environment on the probability of parallel evolution.

PubMed

Bailey, Susan F; Rodrigue, Nicolas; Kassen, Rees

2015-06-01

Across the great diversity of life, there are many compelling examples of parallel and convergent evolution-similar evolutionary changes arising in independently evolving populations. Parallel evolution is often taken to be strong evidence of adaptation occurring in populations that are highly constrained in their genetic variation. Theoretical models suggest a few potential factors driving the probability of parallel evolution, but experimental tests are needed. In this study, we quantify the degree of parallel evolution in 15 replicate populations of Pseudomonas fluorescens evolved in five different environments that varied in resource type and arrangement. We identified repeat changes across multiple levels of biological organization from phenotype, to gene, to nucleotide, and tested the impact of 1) selection environment, 2) the degree of adaptation, and 3) the degree of heterogeneity in the environment on the degree of parallel evolution at the gene-level. We saw, as expected, that parallel evolution occurred more often between populations evolved in the same environment; however, the extent of parallel evolution varied widely. The degree of adaptation did not significantly explain variation in the extent of parallelism in our system but number of available beneficial mutations correlated negatively with parallel evolution. In addition, degree of parallel evolution was significantly higher in populations evolved in a spatially structured, multiresource environment, suggesting that environmental heterogeneity may be an important factor constraining adaptation. Overall, our results stress the importance of environment in driving parallel evolutionary changes and point to a number of avenues for future work for understanding when evolution is predictable. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Final Technical Report for "Applied Mathematics Research: Simulation Based Optimization and Application to Electromagnetic Inverse Problems"

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haber, Eldad

2014-03-17

The focus of research was: Developing adaptive mesh for the solution of Maxwell's equations; Developing a parallel framework for time dependent inverse Maxwell's equations; Developing multilevel methods for optimization problems with inequality constraints; A new inversion code for inverse Maxwell's equations in the 0th frequency (DC resistivity); A new inversion code for inverse Maxwell's equations in low frequency regime. Although the research concentrated on electromagnetic forward and in- verse problems the results of the research was applied to the problem of image registration.
A finite-difference method for the variable coefficient Poisson equation on hierarchical Cartesian meshes

NASA Astrophysics Data System (ADS)

Raeli, Alice; Bergmann, Michel; Iollo, Angelo

2018-02-01

We consider problems governed by a linear elliptic equation with varying coefficients across internal interfaces. The solution and its normal derivative can undergo significant variations through these internal boundaries. We present a compact finite-difference scheme on a tree-based adaptive grid that can be efficiently solved using a natively parallel data structure. The main idea is to optimize the truncation error of the discretization scheme as a function of the local grid configuration to achieve second-order accuracy. Numerical illustrations are presented in two and three-dimensional configurations.

Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widlund, Olof B.

2015-06-09

The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Optimal Design of Passive Power Filters Based on Pseudo-parallel Genetic Algorithm

NASA Astrophysics Data System (ADS)

Li, Pei; Li, Hongbo; Gao, Nannan; Niu, Lin; Guo, Liangfeng; Pei, Ying; Zhang, Yanyan; Xu, Minmin; Chen, Kerui

2017-05-01

The economic costs together with filter efficiency are taken as targets to optimize the parameter of passive filter. Furthermore, the method of combining pseudo-parallel genetic algorithm with adaptive genetic algorithm is adopted in this paper. In the early stages pseudo-parallel genetic algorithm is introduced to increase the population diversity, and adaptive genetic algorithm is used in the late stages to reduce the workload. At the same time, the migration rate of pseudo-parallel genetic algorithm is improved to change with population diversity adaptively. Simulation results show that the filter designed by the proposed method has better filtering effect with lower economic cost, and can be used in engineering.
Massive problem reports mining and analysis based parallelism for similar search

NASA Astrophysics Data System (ADS)

Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling

2017-05-01

Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.
Self-organization in neural networks - Applications in structural optimization

NASA Technical Reports Server (NTRS)

Hajela, Prabhat; Fu, B.; Berke, Laszlo

1993-01-01

The present paper discusses the applicability of ART (Adaptive Resonance Theory) networks, and the Hopfield and Elastic networks, in problems of structural analysis and design. A characteristic of these network architectures is the ability to classify patterns presented as inputs into specific categories. The categories may themselves represent distinct procedural solution strategies. The paper shows how this property can be adapted in the structural analysis and design problem. A second application is the use of Hopfield and Elastic networks in optimization problems. Of particular interest are problems characterized by the presence of discrete and integer design variables. The parallel computing architecture that is typical of neural networks is shown to be effective in such problems. Results of preliminary implementations in structural design problems are also included in the paper.
Numerical Analysis of Ginzburg-Landau Models for Superconductivity.

NASA Astrophysics Data System (ADS)

Coskun, Erhan

Thin film conventional, as well as High T _{c} superconductors of various geometric shapes placed under both uniform and variable strength magnetic field are studied using the universially accepted macroscopic Ginzburg-Landau model. A series of new theoretical results concerning the properties of solution is presented using the semi -discrete time-dependent Ginzburg-Landau equations, staggered grid setup and natural boundary conditions. Efficient serial algorithms including a novel adaptive algorithm is developed and successfully implemented for solving the governing highly nonlinear parabolic system of equations. Refinement technique used in the adaptive algorithm is based on modified forward Euler method which was also developed by us to ease the restriction on time step size for stability considerations. Stability and convergence properties of forward and modified forward Euler schemes are studied. Numerical simulations of various recent physical experiments of technological importance such as vortes motion and pinning are performed. The numerical code for solving time-dependent Ginzburg-Landau equations is parallelized using BlockComm -Chameleon and PCN. The parallel code was run on the distributed memory multiprocessors intel iPSC/860, IBM-SP1 and cluster of Sun Sparc workstations, all located at Mathematics and Computer Science Division, Argonne National Laboratory.
A Cost-Effective Vehicle Localization Solution Using an Interacting Multiple Model−Unscented Kalman Filters (IMM-UKF) Algorithm and Grey Neural Network

PubMed Central

Xu, Qimin; Li, Xu; Chan, Ching-Yao

2017-01-01

In this paper, we propose a cost-effective localization solution for land vehicles, which can simultaneously adapt to the uncertain noise of inertial sensors and bridge Global Positioning System (GPS) outages. First, three Unscented Kalman filters (UKFs) with different noise covariances are introduced into the framework of Interacting Multiple Model (IMM) algorithm to form the proposed IMM-based UKF, termed as IMM-UKF. The IMM algorithm can provide a soft switching among the three UKFs and therefore adapt to different noise characteristics. Further, two IMM-UKFs are executed in parallel when GPS is available. One fuses the information of low-cost GPS, in-vehicle sensors, and micro electromechanical system (MEMS)-based reduced inertial sensor systems (RISS), while the other fuses only in-vehicle sensors and MEMS-RISS. The differences between the state vectors of the two IMM-UKFs are considered as training data of a Grey Neural Network (GNN) module, which is known for its high prediction accuracy with a limited amount of samples. The GNN module can predict and compensate position errors when GPS signals are blocked. To verify the feasibility and effectiveness of the proposed solution, road-test experiments with various driving scenarios were performed. The experimental results indicate that the proposed solution outperforms all the compared methods. PMID:28629165
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Parallel Adaptive Mesh Refinement Library

NASA Technical Reports Server (NTRS)

Mac-Neice, Peter; Olson, Kevin

2005-01-01

Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
Meeting review. Uncovering the genetic basis of adaptive change: on the intersection of landscape genomics and theoretical population genetics.

PubMed

Joost, Stéphane; Vuilleumier, Séverine; Jensen, Jeffrey D; Schoville, Sean; Leempoel, Kevin; Stucki, Sylvie; Widmer, Ivo; Melodelima, Christelle; Rolland, Jonathan; Manel, Stéphanie

2013-07-01

A workshop recently held at the École Polytechnique Fédérale de Lausanne (EPFL, Switzerland) was dedicated to understanding the genetic basis of adaptive change, taking stock of the different approaches developed in theoretical population genetics and landscape genomics and bringing together knowledge accumulated in both research fields. Indeed, an important challenge in theoretical population genetics is to incorporate effects of demographic history and population structure. But important design problems (e.g. focus on populations as units, focus on hard selective sweeps, no hypothesis-based framework in the design of the statistical tests) reduce their capability of detecting adaptive genetic variation. In parallel, landscape genomics offers a solution to several of these problems and provides a number of advantages (e.g. fast computation, landscape heterogeneity integration). But the approach makes several implicit assumptions that should be carefully considered (e.g. selection has had enough time to create a functional relationship between the allele distribution and the environmental variable, or this functional relationship is assumed to be constant). To address the respective strengths and weaknesses mentioned above, the workshop brought together a panel of experts from both disciplines to present their work and discuss the relevance of combining these approaches, possibly resulting in a joint software solution in the future.
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media

NASA Astrophysics Data System (ADS)

Yang, Haijian; Sun, Shuyu; Yang, Chao

2017-03-01

Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
A Parallel Implementation of Multilevel Recursive Spectral Bisection for Application to Adaptive Unstructured Meshes. Chapter 1

NASA Technical Reports Server (NTRS)

Barnard, Stephen T.; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

The design of a parallel implementation of multilevel recursive spectral bisection is described. The goal is to implement a code that is fast enough to enable dynamic repartitioning of adaptive meshes.
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Parallel evolution of a type IV secretion system in radiating lineages of the host-restricted bacterial pathogen Bartonella.

PubMed

Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C; Dehio, Christoph

2011-02-10

Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment.
Parallel Evolution of a Type IV Secretion System in Radiating Lineages of the Host-Restricted Bacterial Pathogen Bartonella

PubMed Central

Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C.; Dehio, Christoph

2011-01-01

Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment. PMID:21347280
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
Design and Analysis of Self-Adapted Task Scheduling Strategies in Wireless Sensor Networks

PubMed Central

Guo, Wenzhong; Xiong, Naixue; Chao, Han-Chieh; Hussain, Sajid; Chen, Guolong

2011-01-01

In a wireless sensor network (WSN), the usage of resources is usually highly related to the execution of tasks which consume a certain amount of computing and communication bandwidth. Parallel processing among sensors is a promising solution to provide the demanded computation capacity in WSNs. Task allocation and scheduling is a typical problem in the area of high performance computing. Although task allocation and scheduling in wired processor networks has been well studied in the past, their counterparts for WSNs remain largely unexplored. Existing traditional high performance computing solutions cannot be directly implemented in WSNs due to the limitations of WSNs such as limited resource availability and the shared communication medium. In this paper, a self-adapted task scheduling strategy for WSNs is presented. First, a multi-agent-based architecture for WSNs is proposed and a mathematical model of dynamic alliance is constructed for the task allocation problem. Then an effective discrete particle swarm optimization (PSO) algorithm for the dynamic alliance (DPSO-DA) with a well-designed particle position code and fitness function is proposed. A mutation operator which can effectively improve the algorithm’s ability of global search and population diversity is also introduced in this algorithm. Finally, the simulation results show that the proposed solution can achieve significant better performance than other algorithms. PMID:22163971
Haptic adaptation to slant: No transfer between exploration modes

PubMed Central

van Dam, Loes C. J.; Plaisier, Myrthe A.; Glowania, Catharina; Ernst, Marc O.

2016-01-01

Human touch is an inherently active sense: to estimate an object’s shape humans often move their hand across its surface. This way the object is sampled both in a serial (sampling different parts of the object across time) and parallel fashion (sampling using different parts of the hand simultaneously). Both the serial (moving a single finger) and parallel (static contact with the entire hand) exploration modes provide reliable and similar global shape information, suggesting the possibility that this information is shared early in the sensory cortex. In contrast, we here show the opposite. Using an adaptation-and-transfer paradigm, a change in haptic perception was induced by slant-adaptation using either the serial or parallel exploration mode. A unified shape-based coding would predict that this would equally affect perception using other exploration modes. However, we found that adaptation-induced perceptual changes did not transfer between exploration modes. Instead, serial and parallel exploration components adapted simultaneously, but to different kinaesthetic aspects of exploration behaviour rather than object-shape per se. These results indicate that a potential combination of information from different exploration modes can only occur at down-stream cortical processing stages, at which adaptation is no longer effective. PMID:27698392
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)

Widespread parallel population adaptation to climate variation across a radiation: implications for adaptation to climate change.

PubMed

Thorpe, Roger S; Barlow, Axel; Malhotra, Anita; Surget-Groba, Yann

2015-03-01

Global warming will impact species in a number of ways, and it is important to know the extent to which natural populations can adapt to anthropogenic climate change by natural selection. Parallel microevolution within separate species can demonstrate natural selection, but several studies of homoplasy have not yet revealed examples of widespread parallel evolution in a generic radiation. Taking into account primary phylogeographic divisions, we investigate numerous quantitative traits (size, shape, scalation, colour pattern and hue) in anole radiations from the mountainous Lesser Antillean islands. Adaptation to climatic differences can lead to very pronounced differences between spatially close populations with all studied traits showing some evidence of parallel evolution. Traits from shape, scalation, pattern and hue (particularly the latter) show widespread evolutionary parallels within these species in response to altitudinal climate variation greater than extreme anthropogenic climate change predicted for 2080. This gives strong evidence of the ability to adapt to climate variation by natural selection throughout this radiation. As anoles can evolve very rapidly, it suggests anthropogenic climate change is likely to be less of a conservation threat than other factors, such as habitat loss and invasive species, in this, Lesser Antillean, biodiversity hot spot. © 2015 John Wiley & Sons Ltd.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Hixon, Duane; Sankar, L. N.

1993-01-01

During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
AMOBH: Adaptive Multiobjective Black Hole Algorithm.

PubMed

Wu, Chong; Wu, Tao; Fu, Kaiyuan; Zhu, Yuan; Li, Yongbo; He, Wangyong; Tang, Shengwen

2017-01-01

This paper proposes a new multiobjective evolutionary algorithm based on the black hole algorithm with a new individual density assessment (cell density), called "adaptive multiobjective black hole algorithm" (AMOBH). Cell density has the characteristics of low computational complexity and maintains a good balance of convergence and diversity of the Pareto front. The framework of AMOBH can be divided into three steps. Firstly, the Pareto front is mapped to a new objective space called parallel cell coordinate system. Then, to adjust the evolutionary strategies adaptively, Shannon entropy is employed to estimate the evolution status. At last, the cell density is combined with a dominance strength assessment called cell dominance to evaluate the fitness of solutions. Compared with the state-of-the-art methods SPEA-II, PESA-II, NSGA-II, and MOEA/D, experimental results show that AMOBH has a good performance in terms of convergence rate, population diversity, population convergence, subpopulation obtention of different Pareto regions, and time complexity to the latter in most cases.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Orion FSW V and V and Kedalion Engineering Lab Insight

NASA Technical Reports Server (NTRS)

Mangieri, Mark L.

2010-01-01

NASA, along with its prime Orion contractor and its subcontractor s are adapting an avionics system paradigm borrowed from the manned commercial aircraft industry for use in manned space flight systems. Integrated Modular Avionics (IMA) techniques have been proven as a robust avionics solution for manned commercial aircraft (B737/777/787, MD 10/90). This presentation will outline current approaches to adapt IMA, along with its heritage FSW V&V paradigms, into NASA's manned space flight program for Orion. NASA's Kedalion engineering analysis lab is on the forefront of validating many of these contemporary IMA based techniques. Kedalion has already validated many of the proposed Orion FSW V&V paradigms using Orion's precursory Flight Test Article (FTA) Pad Abort 1 (PA-1) program. The Kedalion lab will evolve its architectures, tools, and techniques in parallel with the evolving Orion program.
DGDFT: A massively parallel method for large scale density functional theory calculations.

PubMed

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Discrete Diffusion Monte Carlo for Electron Thermal Transport

NASA Astrophysics Data System (ADS)

Chenhall, Jeffrey; Cao, Duc; Wollaeger, Ryan; Moses, Gregory

2014-10-01

The iSNB (implicit Schurtz Nicolai Busquet electron thermal transport method of Cao et al. is adapted to a Discrete Diffusion Monte Carlo (DDMC) solution method for eventual inclusion in a hybrid IMC-DDMC (Implicit Monte Carlo) method. The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the iSNB-DDMC method will be presented. This work was supported by Sandia National Laboratory - Albuquerque.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.

1999-01-01

Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.
What is adaptive about adaptive decision making? A parallel constraint satisfaction account.

PubMed

Glöckner, Andreas; Hilbig, Benjamin E; Jekel, Marc

2014-12-01

There is broad consensus that human cognition is adaptive. However, the vital question of how exactly this adaptivity is achieved has remained largely open. Herein, we contrast two frameworks which account for adaptive decision making, namely broad and general single-mechanism accounts vs. multi-strategy accounts. We propose and fully specify a single-mechanism model for decision making based on parallel constraint satisfaction processes (PCS-DM) and contrast it theoretically and empirically against a multi-strategy account. To achieve sufficiently sensitive tests, we rely on a multiple-measure methodology including choice, reaction time, and confidence data as well as eye-tracking. Results show that manipulating the environmental structure produces clear adaptive shifts in choice patterns - as both frameworks would predict. However, results on the process level (reaction time, confidence), in information acquisition (eye-tracking), and from cross-predicting choice consistently corroborate single-mechanisms accounts in general, and the proposed parallel constraint satisfaction model for decision making in particular. Copyright © 2014 Elsevier B.V. All rights reserved.
Computation of free energy profiles with parallel adaptive dynamics

NASA Astrophysics Data System (ADS)

Lelièvre, Tony; Rousset, Mathias; Stoltz, Gabriel

2007-04-01

We propose a formulation of an adaptive computation of free energy differences, in the adaptive biasing force or nonequilibrium metadynamics spirit, using conditional distributions of samples of configurations which evolve in time. This allows us to present a truly unifying framework for these methods, and to prove convergence results for certain classes of algorithms. From a numerical viewpoint, a parallel implementation of these methods is very natural, the replicas interacting through the reconstructed free energy. We demonstrate how to improve this parallel implementation by resorting to some selection mechanism on the replicas. This is illustrated by computations on a model system of conformational changes.
An Adaptive Memory Interface Controller for Improving Bandwidth Utilization of Hybrid and Reconfigurable Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio

Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
Dual-thread parallel control strategy for ophthalmic adaptive optics.

PubMed

Yu, Yongxin; Zhang, Yuhua

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope.
Dual-thread parallel control strategy for ophthalmic adaptive optics

PubMed Central

Yu, Yongxin; Zhang, Yuhua

2015-01-01

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope. PMID:25866498
Predator-induced phenotypic plasticity of shape and behavior: parallel and unique patterns across sexes and species

PubMed Central

Kinnison, Michael T.

2017-01-01

Abstract Phenotypic plasticity is often an adaptation of organisms to cope with temporally or spatially heterogenous landscapes. Like other adaptations, one would predict that different species, populations, or sexes might thus show some degree of parallel evolution of plasticity, in the form of parallel reaction norms, when exposed to analogous environmental gradients. Indeed, one might even expect parallelism of plasticity to repeatedly evolve in multiple traits responding to the same gradient, resulting in integrated parallelism of plasticity. In this study, we experimentally tested for parallel patterns of predator-mediated plasticity of size, shape, and behavior of 2 species and sexes of mosquitofish. Examination of behavioral trials indicated that the 2 species showed unique patterns of behavioral plasticity, whereas the 2 sexes in each species showed parallel responses. Fish shape showed parallel patterns of plasticity for both sexes and species, albeit males showed evidence of unique plasticity related to reproductive anatomy. Moreover, patterns of shape plasticity due to predator exposure were broadly parallel to what has been depicted for predator-mediated population divergence in other studies (slender bodies, expanded caudal regions, ventrally located eyes, and reduced male gonopodia). We did not find evidence of phenotypic plasticity in fish size for either species or sex. Hence, our findings support broadly integrated parallelism of plasticity for sexes within species and less integrated parallelism for species. We interpret these findings with respect to their potential broader implications for the interacting roles of adaptation and constraint in the evolutionary origins of parallelism of plasticity in general. PMID:29491997
Parallel solution of the symmetric tridiagonal eigenproblem. Research report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-10-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-01-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Tile-based Level of Detail for the Parallel Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niski, K; Cohen, J D

Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

NASA Astrophysics Data System (ADS)

Wang, R.; Harris, C.; Wicenec, A.

2016-07-01

In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

2000-01-01

Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency issues in the GA, it is possible to have idle processors. However, as long as the load at each processing node is similar, the processors are kept busy nearly all of the time. In applying GAs to circuit design, a suitable genetic representation 'is that of a circuit-construction program. We discuss one such circuit-construction programming language and show how evolution can generate useful analog circuit designs. This language has the desirable property that virtually all sets of combinations of primitives result in valid circuit graphs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. Using a parallel genetic algorithm and circuit simulation software, we present experimental results as applied to three analog filter and two amplifier design tasks. For example, a figure shows an 85 dB amplifier design evolved by our system, and another figure shows the performance of that circuit (gain and frequency response). In all tasks, our system is able to generate circuits that achieve the target specifications.

Refraction law and Fermat principle: a project using the ant colony optimization algorithm for undergraduate students in physics

NASA Astrophysics Data System (ADS)

Vuong, Q. L.; Rigaut, C.; Gossuin, Y.

2018-07-01

A programming project for undergraduate students in physics is proposed in this work. Its goal is to check the Snell–Descartes law of refraction using the Fermat principle and the ant colony optimization algorithm. The project involves basic mathematics and physics and is adapted to students with basic programming skills. More advanced tools can be used (but are not mandatory) as parallelization or object-oriented programming, which makes the project also suitable for more experienced students. We propose two tests to validate the program. Our algorithm is able to find solutions which are close to the theoretical predictions. Two quantities are defined to study its convergence and the quality of the solutions. It is also shown that the choice of the values of the simulation parameters is important to efficiently obtain precise results.
GPU Lossless Hyperspectral Data Compression System for Space Applications

NASA Technical Reports Server (NTRS)

Keymeulen, Didier; Aranki, Nazeeh; Hopson, Ben; Kiely, Aaron; Klimesh, Matthew; Benkrid, Khaled

2012-01-01

On-board lossless hyperspectral data compression reduces data volume in order to meet NASA and DoD limited downlink capabilities. At JPL, a novel, adaptive and predictive technique for lossless compression of hyperspectral data, named the Fast Lossless (FL) algorithm, was recently developed. This technique uses an adaptive filtering method and achieves state-of-the-art performance in both compression effectiveness and low complexity. Because of its outstanding performance and suitability for real-time onboard hardware implementation, the FL compressor is being formalized as the emerging CCSDS Standard for Lossless Multispectral & Hyperspectral image compression. The FL compressor is well-suited for parallel hardware implementation. A GPU hardware implementation was developed for FL targeting the current state-of-the-art GPUs from NVIDIA(Trademark). The GPU implementation on a NVIDIA(Trademark) GeForce(Trademark) GTX 580 achieves a throughput performance of 583.08 Mbits/sec (44.85 MSamples/sec) and an acceleration of at least 6 times a software implementation running on a 3.47 GHz single core Intel(Trademark) Xeon(Trademark) processor. This paper describes the design and implementation of the FL algorithm on the GPU. The massively parallel implementation will provide in the future a fast and practical real-time solution for airborne and space applications.
The Importance of Considering Differences in Study Design in Network Meta-analysis: An Application Using Anti-Tumor Necrosis Factor Drugs for Ulcerative Colitis.

PubMed

Cameron, Chris; Ewara, Emmanuel; Wilson, Florence R; Varu, Abhishek; Dyrda, Peter; Hutton, Brian; Ingham, Michael

2017-11-01

Adaptive trial designs present a methodological challenge when performing network meta-analysis (NMA), as data from such adaptive trial designs differ from conventional parallel design randomized controlled trials (RCTs). We aim to illustrate the importance of considering study design when conducting an NMA. Three NMAs comparing anti-tumor necrosis factor drugs for ulcerative colitis were compared and the analyses replicated using Bayesian NMA. The NMA comprised 3 RCTs comparing 4 treatments (adalimumab 40 mg, golimumab 50 mg, golimumab 100 mg, infliximab 5 mg/kg) and placebo. We investigated the impact of incorporating differences in the study design among the 3 RCTs and presented 3 alternative methods on how to convert outcome data derived from one form of adaptive design to more conventional parallel RCTs. Combining RCT results without considering variations in study design resulted in effect estimates that were biased against golimumab. In contrast, using the 3 alternative methods to convert outcome data from one form of adaptive design to a format more consistent with conventional parallel RCTs facilitated more transparent consideration of differences in study design. This approach is more likely to yield appropriate estimates of comparative efficacy when conducting an NMA, which includes treatments that use an alternative study design. RCTs based on adaptive study designs should not be combined with traditional parallel RCT designs in NMA. We have presented potential approaches to convert data from one form of adaptive design to more conventional parallel RCTs to facilitate transparent and less-biased comparisons.
Conformational free energies of methyl-α-L-iduronic and methyl-β-D-glucuronic acids in water

NASA Astrophysics Data System (ADS)

Babin, Volodymyr; Sagui, Celeste

2010-03-01

We present a simulation protocol that allows for efficient sampling of the degrees of freedom of a solute in explicit solvent. The protocol involves using a nonequilibrium umbrella sampling method, in this case, the recently developed adaptively biased molecular dynamics method, to compute an approximate free energy for the slow modes of the solute in explicit solvent. This approximate free energy is then used to set up a Hamiltonian replica exchange scheme that samples both from biased and unbiased distributions. The final accurate free energy is recovered via the weighted histogram analysis technique applied to all the replicas, and equilibrium properties of the solute are computed from the unbiased trajectory. We illustrate the approach by applying it to the study of the puckering landscapes of the methyl glycosides of α-L-iduronic acid and its C5 epimer β-D-glucuronic acid in water. Big savings in computational resources are gained in comparison to the standard parallel tempering method.
Conformational free energies of methyl-alpha-L-iduronic and methyl-beta-D-glucuronic acids in water.

PubMed

Babin, Volodymyr; Sagui, Celeste

2010-03-14

We present a simulation protocol that allows for efficient sampling of the degrees of freedom of a solute in explicit solvent. The protocol involves using a nonequilibrium umbrella sampling method, in this case, the recently developed adaptively biased molecular dynamics method, to compute an approximate free energy for the slow modes of the solute in explicit solvent. This approximate free energy is then used to set up a Hamiltonian replica exchange scheme that samples both from biased and unbiased distributions. The final accurate free energy is recovered via the weighted histogram analysis technique applied to all the replicas, and equilibrium properties of the solute are computed from the unbiased trajectory. We illustrate the approach by applying it to the study of the puckering landscapes of the methyl glycosides of alpha-L-iduronic acid and its C5 epimer beta-D-glucuronic acid in water. Big savings in computational resources are gained in comparison to the standard parallel tempering method.
A new parallelization scheme for adaptive mesh refinement

DOE PAGES

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.; ...

2016-05-06

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
A new parallelization scheme for adaptive mesh refinement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
3D CSEM inversion based on goal-oriented adaptive finite element method

NASA Astrophysics Data System (ADS)

Zhang, Y.; Key, K.

2016-12-01

We present a parallel 3D frequency domain controlled-source electromagnetic inversion code name MARE3DEM. Non-linear inversion of observed data is performed with the Occam variant of regularized Gauss-Newton optimization. The forward operator is based on the goal-oriented finite element method that efficiently calculates the responses and sensitivity kernels in parallel using a data decomposition scheme where independent modeling tasks contain different frequencies and subsets of the transmitters and receivers. To accommodate complex 3D conductivity variation with high flexibility and precision, we adopt the dual-grid approach where the forward mesh conforms to the inversion parameter grid and is adaptively refined until the forward solution converges to the desired accuracy. This dual-grid approach is memory efficient, since the inverse parameter grid remains independent from fine meshing generated around the transmitter and receivers by the adaptive finite element method. Besides, the unstructured inverse mesh efficiently handles multiple scale structures and allows for fine-scale model parameters within the region of interest. Our mesh generation engine keeps track of the refinement hierarchy so that the map of conductivity and sensitivity kernel between the forward and inverse mesh is retained. We employ the adjoint-reciprocity method to calculate the sensitivity kernels which establish a linear relationship between changes in the conductivity model and changes in the modeled responses. Our code uses a direcy solver for the linear systems, so the adjoint problem is efficiently computed by re-using the factorization from the primary problem. Further computational efficiency and scalability is obtained in the regularized Gauss-Newton portion of the inversion using parallel dense matrix-matrix multiplication and matrix factorization routines implemented with the ScaLAPACK library. We show the scalability, reliability and the potential of the algorithm to deal with complex geological scenarios by applying it to the inversion of synthetic marine controlled source EM data generated for a complex 3D offshore model with significant seafloor topography.
20-GFLOPS QR processor on a Xilinx Virtex-E FPGA

NASA Astrophysics Data System (ADS)

Walke, Richard L.; Smith, Robert W. M.; Lightbody, Gaye

2000-11-01

Adaptive beamforming can play an important role in sensor array systems in countering directional interference. In high-sample rate systems, such as radar and comms, the calculation of adaptive weights is a very computational task that requires highly parallel solutions. For systems where low power consumption and volume are important the only viable implementation is as an Application Specific Integrated Circuit (ASIC). However, the rapid advancement of Field Programmable Gate Array (FPGA) technology is enabling highly credible re-programmable solutions. In this paper we present the implementation of a scalable linear array processor for weight calculation using QR decomposition. We employ floating-point arithmetic with mantissa size optimized to the target application to minimize component size, and implement them as relationally placed macros (RPMs) on Xilinx Virtex FPGAs to achieve predictable dense layout and high-speed operation. We present results that show that 20GFLOPS of sustained computation on a single XCV3200E-8 Virtex-E FPGA is possible. We also describe the parameterized implementation of the floating-point operators and QR-processor, and the design methodology that enables us to rapidly generate complex FPGA implementations using the industry standard hardware description language VHDL.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Adaptively loaded SP-offset-QAM OFDM for IM/DD communication systems.

PubMed

Zhao, Jian; Chan, Chun-Kit

2017-09-04

In this paper, we propose adaptively loaded set-partitioned offset quadrature amplitude modulation (SP-offset-QAM) orthogonal frequency division multiplexing (OFDM) for low-cost intensity-modulation direct-detection (IM/DD) communication systems. We compare this scheme with multi-band carrier-less amplitude phase modulation (CAP) and conventional OFDM, and demonstrate >40 Gbit/s transmission over 50-km single-mode fiber. It is shown that the use of SP-QAM formats, together with the adaptive loading algorithm specifically designed to this group of formats, results in significant performance improvement for all these three schemes. SP-offset-QAM OFDM exhibits greatly reduced complexity compared to SP-QAM based multi-band CAP, via parallelized implementation and minimized memory length for spectral shaping. On the other hand, this scheme shows better performance than SP-QAM based conventional OFDM at both back-to-back and after transmission. We also characterize the proposed scheme in terms of enhanced tolerance to fiber intra-channel nonlinearity and the potential to increase the communication security. The studies show that adaptive SP-offset-QAM OFDM is a promising IM/DD solution for medium- and long-reach optical access networks and data center connections.
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

DOE PAGES

Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...

2016-06-30

In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Scaling Semantic Graph Databases in Size and Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Villa, Oreste

In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

NASA Technical Reports Server (NTRS)

Gropp, W. D.; Keyes, D. E.; McInnes, L. C.; Tidriri, M. D.

1998-01-01

Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, "routine" parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (Psi-NKS) algorithmic framework is presented as an answer. We show that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, Psi-NKS can simultaneously deliver: globalized, asymptotically rapid convergence through adaptive pseudo- transient continuation and Newton's method-, reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and high per- processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of Psi-NKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of Psi-NKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here.
Candidate genes and adaptive radiation: insights from transcriptional adaptation to the limnetic niche among coregonine fishes (Coregonus spp., Salmonidae).

PubMed

Jeukens, Julie; Bittner, David; Knudsen, Rune; Bernatchez, Louis

2009-01-01

In the past 40 years, there has been increasing acceptance that variation in levels of gene expression represents a major source of evolutionary novelty. Gene expression divergence is therefore likely to be involved in the emergence of incipient species, namely, in a context of adaptive radiation. In the lake whitefish species complex (Coregonus clupeaformis), previous microarray experiments have led to the identification of candidate genes potentially implicated in the parallel evolution of the limnetic dwarf lake whitefish, which is highly distinct from the benthic normal lake whitefish in life history, morphology, metabolism, and behavior, and yet diverged from it only approximately 15,000 years before present. The aim of the present study was to address transcriptional divergence for six candidate genes among lake whitefish and European whitefish (Coregonus lavaretus) species pairs, as well as lake cisco (Coregonus artedi) and vendace (Coregonus albula). The main goal was to test the hypothesis that parallel phenotypic adaptation toward the use of the limnetic niche in coregonine fishes is accompanied by parallelism in candidate gene transcription as measured by quantitative real-time polymerase chain reaction. Results obtained for three candidate genes, whereby parallelism in expression was observed across all whitefish species pairs, provide strong support for the hypothesis that divergent natural selection plays an important role in the adaptive radiation of whitefish species. However, this parallelism in expression did not extend to cisco and vendace, thereby infirming transcriptional convergence between limnetic whitefish species and their limnetic congeners for these genes. As recently proposed (Lynch 2007a. The evolution of genetic networks by non-adaptive processes. Nat Rev Genet. 8:803-813), these results may suggest that convergent phenotypic evolution can result from nonadaptive shaping of genome architecture in independently evolved coregonine lineages.
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

NASA Technical Reports Server (NTRS)

Padovan, Joseph; Krishna, Lala; Gute, Douglas

1997-01-01

Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
Newton-like methods for Navier-Stokes solution

NASA Astrophysics Data System (ADS)

Qin, N.; Xu, X.; Richards, B. E.

1992-12-01

The paper reports on Newton-like methods called SFDN-alpha-GMRES and SQN-alpha-GMRES methods that have been devised and proven as powerful schemes for large nonlinear problems typical of viscous compressible Navier-Stokes solutions. They can be applied using a partially converged solution from a conventional explicit or approximate implicit method. Developments have included the efficient parallelization of the schemes on a distributed memory parallel computer. The methods are illustrated using a RISC workstation and a transputer parallel system respectively to solve a hypersonic vortical flow.

Wavelet Transforms in Parallel Image Processing

DTIC Science & Technology

1994-01-27

NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Parallel-vector solution of large-scale structural analysis problems on supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1989-01-01

A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Progress on the development of FullWave, a Hot and Cold Plasma Parallel Full Wave Code

NASA Astrophysics Data System (ADS)

Spencer, J. Andrew; Svidzinski, Vladimir; Zhao, Liangji; Kim, Jin-Soo

2017-10-01

FullWave is being developed at FAR-TECH, Inc. to simulate RF waves in hot inhomogeneous magnetized plasmas without making small orbit approximations. FullWave is based on a meshless formulation in configuration space on non-uniform clouds of computational points (CCP) adapted to better resolve plasma resonances, antenna structures and complex boundaries. The linear frequency domain wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. The details of FullWave and some preliminary results will be presented, including: 1) a monitor function based on analytic solutions of the cold-plasma dispersion relation; 2) an adaptive CCP based on the monitor function; 3) construction of the finite differences for approximation of derivatives on adaptive CCP; 4) results of 2-D full wave simulations in the cold plasma model in tokamak geometry using the formulated approach for ECRH, ICRH and Lower Hybrid range of frequencies. Work is supported by the U.S. DOE SBIR program.
Adaptive algorithm of selecting optimal variant of errors detection system for digital means of automation facility of oil and gas complex

NASA Astrophysics Data System (ADS)

Poluyan, A. Y.; Fugarov, D. D.; Purchina, O. A.; Nesterchuk, V. V.; Smirnova, O. V.; Petrenkova, S. B.

2018-05-01

To date, the problems associated with the detection of errors in digital equipment (DE) systems for the automation of explosive objects of the oil and gas complex are extremely actual. Especially this problem is actual for facilities where a violation of the accuracy of the DE will inevitably lead to man-made disasters and essential material damage, at such facilities, the diagnostics of the accuracy of the DE operation is one of the main elements of the industrial safety management system. In the work, the solution of the problem of selecting the optimal variant of the errors detection system of errors detection by a validation criterion. Known methods for solving these problems have an exponential valuation of labor intensity. Thus, with a view to reduce time for solving the problem, a validation criterion is compiled as an adaptive bionic algorithm. Bionic algorithms (BA) have proven effective in solving optimization problems. The advantages of bionic search include adaptability, learning ability, parallelism, the ability to build hybrid systems based on combining. [1].
Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres

2016-05-16

Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been writtenmore » to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.« less
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

NASA Astrophysics Data System (ADS)

Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin

2016-08-01

This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
CFD analysis of hypersonic, chemically reacting flow fields

NASA Technical Reports Server (NTRS)

Edwards, T. A.

1993-01-01

Design studies are underway for a variety of hypersonic flight vehicles. The National Aero-Space Plane will provide a reusable, single-stage-to-orbit capability for routine access to low earth orbit. Flight-capable satellites will dip into the atmosphere to maneuver to new orbits, while planetary probes will decelerate at their destination by atmospheric aerobraking. To supplement limited experimental capabilities in the hypersonic regime, computational fluid dynamics (CFD) is being used to analyze the flow about these configurations. The governing equations include fluid dynamic as well as chemical species equations, which are being solved with new, robust numerical algorithms. Examples of CFD applications to hypersonic vehicles suggest an important role this technology will play in the development of future aerospace systems. The computational resources needed to obtain solutions are large, but solution adaptive grids, convergence acceleration, and parallel processing may make run times manageable.
FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

NASA Astrophysics Data System (ADS)

Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

2009-12-01

FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering, industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.
Parallel Evolution of Cold Tolerance within Drosophila melanogaster

PubMed Central

Braun, Dylan T.; Lack, Justin B.

2017-01-01

Drosophila melanogaster originated in tropical Africa before expanding into strikingly different temperate climates in Eurasia and beyond. Here, we find elevated cold tolerance in three distinct geographic regions: beyond the well-studied non-African case, we show that populations from the highlands of Ethiopia and South Africa have significantly increased cold tolerance as well. We observe greater cold tolerance in outbred versus inbred flies, but only in populations with higher inversion frequencies. Each cold-adapted population shows lower inversion frequencies than a closely-related warm-adapted population, suggesting that inversion frequencies may decrease with altitude in addition to latitude. Using the FST-based “Population Branch Excess” statistic (PBE), we found only limited evidence for parallel genetic differentiation at the scale of ∼4 kb windows, specifically between Ethiopian and South African cold-adapted populations. And yet, when we looked for single nucleotide polymorphisms (SNPs) with codirectional frequency change in two or three cold-adapted populations, strong genomic enrichments were observed from all comparisons. These findings could reflect an important role for selection on standing genetic variation leading to “soft sweeps”. One SNP showed sufficient codirectional frequency change in all cold-adapted populations to achieve experiment-wide significance: an intronic variant in the synaptic gene Prosap. Another codirectional outlier SNP, at senseless-2, had a strong association with our cold trait measurements, but in the opposite direction as predicted. More generally, proteins involved in neurotransmission were enriched as potential targets of parallel adaptation. The ability to study cold tolerance evolution in a parallel framework will enhance this classic study system for climate adaptation. PMID:27777283
A parallel metaheuristic for large mixed-integer dynamic optimization problems, with applications in computational biology

PubMed Central

Henriques, David; González, Patricia; Doallo, Ramón; Saez-Rodriguez, Julio; Banga, Julio R.

2017-01-01

Background We consider a general class of global optimization problems dealing with nonlinear dynamic models. Although this class is relevant to many areas of science and engineering, here we are interested in applying this framework to the reverse engineering problem in computational systems biology, which yields very large mixed-integer dynamic optimization (MIDO) problems. In particular, we consider the framework of logic-based ordinary differential equations (ODEs). Methods We present saCeSS2, a parallel method for the solution of this class of problems. This method is based on an parallel cooperative scatter search metaheuristic, with new mechanisms of self-adaptation and specific extensions to handle large mixed-integer problems. We have paid special attention to the avoidance of convergence stagnation using adaptive cooperation strategies tailored to this class of problems. Results We illustrate its performance with a set of three very challenging case studies from the domain of dynamic modelling of cell signaling. The simpler case study considers a synthetic signaling pathway and has 84 continuous and 34 binary decision variables. A second case study considers the dynamic modeling of signaling in liver cancer using high-throughput data, and has 135 continuous and 109 binaries decision variables. The third case study is an extremely difficult problem related with breast cancer, involving 690 continuous and 138 binary decision variables. We report computational results obtained in different infrastructures, including a local cluster, a large supercomputer and a public cloud platform. Interestingly, the results show how the cooperation of individual parallel searches modifies the systemic properties of the sequential algorithm, achieving superlinear speedups compared to an individual search (e.g. speedups of 15 with 10 cores), and significantly improving (above a 60%) the performance with respect to a non-cooperative parallel scheme. The scalability of the method is also good (tests were performed using up to 300 cores). Conclusions These results demonstrate that saCeSS2 can be used to successfully reverse engineer large dynamic models of complex biological pathways. Further, these results open up new possibilities for other MIDO-based large-scale applications in the life sciences such as metabolic engineering, synthetic biology, drug scheduling. PMID:28813442
A parallel metaheuristic for large mixed-integer dynamic optimization problems, with applications in computational biology.

PubMed

Penas, David R; Henriques, David; González, Patricia; Doallo, Ramón; Saez-Rodriguez, Julio; Banga, Julio R

2017-01-01

We consider a general class of global optimization problems dealing with nonlinear dynamic models. Although this class is relevant to many areas of science and engineering, here we are interested in applying this framework to the reverse engineering problem in computational systems biology, which yields very large mixed-integer dynamic optimization (MIDO) problems. In particular, we consider the framework of logic-based ordinary differential equations (ODEs). We present saCeSS2, a parallel method for the solution of this class of problems. This method is based on an parallel cooperative scatter search metaheuristic, with new mechanisms of self-adaptation and specific extensions to handle large mixed-integer problems. We have paid special attention to the avoidance of convergence stagnation using adaptive cooperation strategies tailored to this class of problems. We illustrate its performance with a set of three very challenging case studies from the domain of dynamic modelling of cell signaling. The simpler case study considers a synthetic signaling pathway and has 84 continuous and 34 binary decision variables. A second case study considers the dynamic modeling of signaling in liver cancer using high-throughput data, and has 135 continuous and 109 binaries decision variables. The third case study is an extremely difficult problem related with breast cancer, involving 690 continuous and 138 binary decision variables. We report computational results obtained in different infrastructures, including a local cluster, a large supercomputer and a public cloud platform. Interestingly, the results show how the cooperation of individual parallel searches modifies the systemic properties of the sequential algorithm, achieving superlinear speedups compared to an individual search (e.g. speedups of 15 with 10 cores), and significantly improving (above a 60%) the performance with respect to a non-cooperative parallel scheme. The scalability of the method is also good (tests were performed using up to 300 cores). These results demonstrate that saCeSS2 can be used to successfully reverse engineer large dynamic models of complex biological pathways. Further, these results open up new possibilities for other MIDO-based large-scale applications in the life sciences such as metabolic engineering, synthetic biology, drug scheduling.
Results from the OH-PT model: a Kinetic-MHD Model of the Outer Heliosphere within SWMF

NASA Astrophysics Data System (ADS)

Michael, A.; Opher, M.; Tenishev, V.; Borovikov, D.; Toth, G.

2017-12-01

We present an update of the OH-PT model, a kinetic-MHD model of the outer heliosphere. The OH-PT model couples the Outer Heliosphere (OH) and Particle Tracker (PT) components within the Space Weather Modeling Framework (SWMF). The OH component utilizes the Block-Adaptive Tree Solarwind Roe-type Upwind Scheme (BATS-R-US) MHD code, a highly parallel, 3D, and block-adaptive solver. As a stand-alone model, the OH component solves the ideal MHD equations for the plasma and a separate set of Euler's equations for the different populations of neutral atoms. The neutrals and plasma in the outer heliosphere are coupled through charge-exchange. While this provides an accurate solution for the plasma, it is an inaccurate description of the neutrals. The charge-exchange mean free path is on the order of the size of the heliosphere; therefore the neutrals cannot be described as a fluid. The PT component is based on the Adaptive Mesh Particle Simulator (AMPS) model, a 3D, direct simulation Monte Carlo model that solves the Boltzmann equation for the motion and interaction of multi-species plasma and is used to model the neutral distribution functions throughout the domain. The charge-exchange process occurs within AMPS, which handles each event on a particle-by-particle basis and calculates the resulting source terms to the MHD equations. The OH-PT model combines the MHD solution for the plasma with the kinetic solution for the neutrals to form a self-consistent model of the heliosphere. In this work, we present verification and validation of the model as well as demonstrate the codes capabilities. Furthermore we provide a comparison of the OH-PT model to our multi-fluid approximation and detail the differences between the models in both the plasma solution and neutral distribution functions.
Hybrid electric vehicles and electrochemical storage systems — a technology push-pull couple

NASA Astrophysics Data System (ADS)

Gutmann, Günter

In the advance of fuel cell electric vehicles (EV), hybrid electric vehicles (HEV) can contribute to reduced emissions and energy consumption of personal cars as a short term solution. Trade-offs reveal better emission control for series hybrid vehicles, while parallel hybrid vehicles with different drive trains may significantly reduce fuel consumption as well. At present, costs and marketing considerations favor parallel hybrid vehicles making use of small, high power batteries. With ultra high power density cells in development, exceeding 1 kW/kg, high power batteries can be provided by adapting a technology closely related to consumer cell production. Energy consumption and emissions may benefit from regenerative braking and smoothing of the internal combustion engine (ICE) response as well, with limited additional battery weight. High power supercapacitors may assist the achievement of this goal. Problems to be solved in practice comprise battery management to assure equilibration of individual cell state-of-charge for long battery life without maintenance, and efficient strategies for low energy consumption.
Multicoil resonance-based parallel array for smart wireless power delivery.

PubMed

Mirbozorgi, S A; Sawan, M; Gosselin, B

2013-01-01

This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.
A heuristic for suffix solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bilgory, A.; Gajski, D.D.

1986-01-01

The suffix problem has appeared in solutions of recurrence systems for parallel and pipelined machines and more recently in the design of gate and silicon compilers. In this paper the authors present two algorithms. The first algorithm generates parallel suffix solutions with minimum cost for a given length, time delay, availability of initial values, and fanout. This algorithm generates a minimal solution for any length n and depth range log/sub 2/ N to N. The second algorithm reduces the size of the solutions generated by the first algorithm.
Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

PubMed

Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

2014-12-23

Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
Self-consistent treatment of the local dielectric permittivity and electrostatic potential in solution for polarizable macromolecular force fields.

PubMed

Hassan, Sergio A

2012-08-21

A self-consistent method is presented for the calculation of the local dielectric permittivity and electrostatic potential generated by a solute of arbitrary shape and charge distribution in a polar and polarizable liquid. The structure and dynamics behavior of the liquid at the solute/liquid interface determine the spatial variations of the density and the dielectric response. Emphasis here is on the treatment of the interface. The method is an extension of conventional methods used in continuum protein electrostatics, and can be used to estimate changes in the static dielectric response of the liquid as it adapts to charge redistribution within the solute. This is most relevant in the context of polarizable force fields, during electron structure optimization in quantum chemical calculations, or upon charge transfer. The method is computationally efficient and well suited for code parallelization, and can be used for on-the-fly calculations of the local permittivity in dynamics simulations of systems with large and heterogeneous charge distributions, such as proteins, nucleic acids, and polyelectrolytes. Numerical calculation of the system free energy is discussed for the general case of a liquid with field-dependent dielectric response.
Self-consistent treatment of the local dielectric permittivity and electrostatic potential in solution for polarizable macromolecular force fields

NASA Astrophysics Data System (ADS)

Hassan, Sergio A.

2012-08-01

A self-consistent method is presented for the calculation of the local dielectric permittivity and electrostatic potential generated by a solute of arbitrary shape and charge distribution in a polar and polarizable liquid. The structure and dynamics behavior of the liquid at the solute/liquid interface determine the spatial variations of the density and the dielectric response. Emphasis here is on the treatment of the interface. The method is an extension of conventional methods used in continuum protein electrostatics, and can be used to estimate changes in the static dielectric response of the liquid as it adapts to charge redistribution within the solute. This is most relevant in the context of polarizable force fields, during electron structure optimization in quantum chemical calculations, or upon charge transfer. The method is computationally efficient and well suited for code parallelization, and can be used for on-the-fly calculations of the local permittivity in dynamics simulations of systems with large and heterogeneous charge distributions, such as proteins, nucleic acids, and polyelectrolytes. Numerical calculation of the system free energy is discussed for the general case of a liquid with field-dependent dielectric response.
Self-consistent treatment of the local dielectric permittivity and electrostatic potential in solution for polarizable macromolecular force fields

PubMed Central

Hassan, Sergio A.

2012-01-01

A self-consistent method is presented for the calculation of the local dielectric permittivity and electrostatic potential generated by a solute of arbitrary shape and charge distribution in a polar and polarizable liquid. The structure and dynamics behavior of the liquid at the solute/liquid interface determine the spatial variations of the density and the dielectric response. Emphasis here is on the treatment of the interface. The method is an extension of conventional methods used in continuum protein electrostatics, and can be used to estimate changes in the static dielectric response of the liquid as it adapts to charge redistribution within the solute. This is most relevant in the context of polarizable force fields, during electron structure optimization in quantum chemical calculations, or upon charge transfer. The method is computationally efficient and well suited for code parallelization, and can be used for on-the-fly calculations of the local permittivity in dynamics simulations of systems with large and heterogeneous charge distributions, such as proteins, nucleic acids, and polyelectrolytes. Numerical calculation of the system free energy is discussed for the general case of a liquid with field-dependent dielectric response. PMID:22920098
Safe medication management in specialized home healthcare - an observational study.

PubMed

Lindblad, Marléne; Flink, Maria; Ekstedt, Mirjam

2017-08-24

Medication management is a complex, error-prone process. The aim of this study was to explore what constitutes the complexity of the medication management process (MMP) in specialized home healthcare and how healthcare professionals handle this complexity. The study is theoretically based in resilience engineering. Data were collected during the MMP at three specialized home healthcare units in Sweden using two strategies: observation of workplaces and shadowing RNs in everyday work, including interviews. Transcribed material was analysed using grounded theory. The MMP in home healthcare was dynamic and complex with unclear boundaries of responsibilities, inadequate information systems and fluctuating work conditions. Healthcare professionals adapted their everyday clinical work by sharing responsibility and simultaneously being authoritative and preserving patients' active participation, autonomy and integrity. To promote a safe MMP, healthcare professionals constantly re-prioritized goals, handled gaps in communication and information transmission at a distance by creating new bridging solutions. Trade-offs and workarounds were necessary elements, but also posed a threat to patient safety, as these interim solutions were not systematically evaluated or devised learning strategies. To manage a safe medication process in home healthcare, healthcare professionals need to adapt to fluctuating conditions and create bridging strategies through multiple parallel activities distributed over time, space and actors. The healthcare professionals' strategies could be integrated in continuous learning, while preserving boundaries of safety, instead of being more or less interim solutions. Patients' and family caregivers' as active partners in the MMP may be an underestimated resource for a resilient home healthcare.

A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant

NASA Astrophysics Data System (ADS)

Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.

2018-04-01

A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
Understanding the leaky engineering pipeline: Motivation and job adaptability of female engineers

NASA Astrophysics Data System (ADS)

Saraswathiamma, Manjusha Thekkedathu

This dissertation is a mixed-method study conducted using qualitative grounded theory and quantitative survey and correlation approaches. This study aims to explore the motivation and adaptability of females in the engineering profession and to develop a theoretical framework for both motivation and adaptability issues. As a result, this study endeavors to design solutions for the low enrollment and attenuation of female engineers in the engineering profession, often referred to as the "leaky female engineering pipeline." Profiles of 123 female engineers were studied for the qualitative approach, and 98 completed survey responses were analyzed for the quantitative approach. The qualitative, grounded-theory approach applied the constant comparison method; open, axial, and selective coding was used to classify the information in categories, sub-categories, and themes for both motivation and adaptability. The emergent themes for decisions motivating female enrollment include cognitive, emotional, and environmental factors. The themes identified for adaptability include the seven job adaptability factors: job satisfaction, risk- taking attitude, career/skill development, family, gender stereotyping, interpersonal skills, and personal benefit, as well as the self-perceived job adaptability factor. Illeris' Three-dimensional Learning Theory was modified as a model for decisions motivating female enrollment. This study suggests a firsthand conceptual parallelism of McClusky's Theory of Margin for the adaptability of female engineers in the profession. Also, this study attempted to design a survey instrument to measure job adaptability of female engineers. The study identifies two factors that are significantly related to job adaptability: interpersonal skills (< p = 0.01) and family (< p = 0.05); gender stereotyping and personal benefit are other factors that are also significantly (< p = 0.1) related.
Temperature compensated photovoltaic array

DOEpatents

Mosher, D.M.

1997-11-18

A temperature compensated photovoltaic module comprises a series of solar cells having a thermally activated switch connected in parallel with several of the cells. The photovoltaic module is adapted to charge conventional batteries having a temperature coefficient differing from the temperature coefficient of the module. The calibration temperatures of the switches are chosen whereby the colder the ambient temperature for the module, the more switches that are on and form a closed circuit to short the associated solar cells. By shorting some of the solar cells as the ambient temperature decreases, the battery being charged by the module is not excessively overcharged at lower temperatures. PV module is an integrated solution that is reliable and inexpensive. 2 figs.
Dendritic Growth with Fluid Flow for Pure Materials

NASA Technical Reports Server (NTRS)

Jeong, Jun-Ho; Dantzig, Jonathan A.; Goldenfeld, Nigel

2003-01-01

We have developed a three-dimensional, adaptive, parallel finite element code to examine solidification of pure materials under conditions of forced flow. We have examined the effect of undercooling, surface tension anisotropy and imposed flow velocity on the growth. The flow significantly alters the growth process, producing dendrites that grow faster, and with greater tip curvature, into the flow. The selection constant decreases slightly with flow velocity in our calculations. The results of the calculations agree well with the transport solution of Saville and Beaghton at high undercooling and high anisotropy. At low undercooling, significant deviations are found. We attribute this difference to the influence of other parts of the dendrite, removed from the tip, on the flow field.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
Direct-current nanogenerator driven by ultrasonic waves.

PubMed

Wang, Xudong; Song, Jinhui; Liu, Jin; Wang, Zhong Lin

2007-04-06

We have developed a nanowire nanogenerator that is driven by an ultrasonic wave to produce continuous direct-current output. The nanogenerator was fabricated with vertically aligned zinc oxide nanowire arrays that were placed beneath a zigzag metal electrode with a small gap. The wave drives the electrode up and down to bend and/or vibrate the nanowires. A piezoelectric-semiconducting coupling process converts mechanical energy into electricity. The zigzag electrode acts as an array of parallel integrated metal tips that simultaneously and continuously create, collect, and output electricity from all of the nanowires. The approach presents an adaptable, mobile, and cost-effective technology for harvesting energy from the environment, and it offers a potential solution for powering nanodevices and nanosystems.
From pilot's associate to satellite controller's associate

NASA Technical Reports Server (NTRS)

Neyland, David L.; Lizza, Carl; Merkel, Philip A.

1992-01-01

Associate technology is an emerging engineering discipline wherein intelligent automation can significantly augment the performance of man-machine systems. An associate system is one that monitors operator activity and adapts its operational behavior accordingly. Associate technology is most effectively applied when mapped into management of the human-machine interface and display-control loop in typical manned systems. This paper addresses the potential for application of associate technology into the arena of intelligent command and control of satellite systems, from diagnosis of onboard and onground of satellite systems fault conditions, to execution of nominal satellite control functions. Rather than specifying a specific solution, this paper draws parallels between the Pilot's Associate concept and the domain of satellite control.
Load Balancing Unstructured Adaptive Grids for CFD Problems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid

1996-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Hybrid Self-Adaptive Evolution Strategies Guided by Neighborhood Structures for Combinatorial Optimization Problems.

PubMed

Coelho, V N; Coelho, I M; Souza, M J F; Oliveira, T A; Cota, L P; Haddad, M N; Mladenovic, N; Silva, R C P; Guimarães, F G

2016-01-01

This article presents an Evolution Strategy (ES)--based algorithm, designed to self-adapt its mutation operators, guiding the search into the solution space using a Self-Adaptive Reduced Variable Neighborhood Search procedure. In view of the specific local search operators for each individual, the proposed population-based approach also fits into the context of the Memetic Algorithms. The proposed variant uses the Greedy Randomized Adaptive Search Procedure with different greedy parameters for generating its initial population, providing an interesting exploration-exploitation balance. To validate the proposal, this framework is applied to solve three different [Formula: see text]-Hard combinatorial optimization problems: an Open-Pit-Mining Operational Planning Problem with dynamic allocation of trucks, an Unrelated Parallel Machine Scheduling Problem with Setup Times, and the calibration of a hybrid fuzzy model for Short-Term Load Forecasting. Computational results point out the convergence of the proposed model and highlight its ability in combining the application of move operations from distinct neighborhood structures along the optimization. The results gathered and reported in this article represent a collective evidence of the performance of the method in challenging combinatorial optimization problems from different application domains. The proposed evolution strategy demonstrates an ability of adapting the strength of the mutation disturbance during the generations of its evolution process. The effectiveness of the proposal motivates the application of this novel evolutionary framework for solving other combinatorial optimization problems.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
''A Parallel Adaptive Simulation Tool for Two Phase Steady State Reacting Flows in Industrial Boilers and Furnaces''

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michael J. Bockelie

2002-01-04

This DOE SBIR Phase II final report summarizes research that has been performed to develop a parallel adaptive tool for modeling steady, two phase turbulent reacting flow. The target applications for the new tool are full scale, fossil-fuel fired boilers and furnaces such as those used in the electric utility industry, chemical process industry and mineral/metal process industry. The type of analyses to be performed on these systems are engineering calculations to evaluate the impact on overall furnace performance due to operational, process or equipment changes. To develop a Computational Fluid Dynamics (CFD) model of an industrial scale furnace requiresmore » a carefully designed grid that will capture all of the large and small scale features of the flowfield. Industrial systems are quite large, usually measured in tens of feet, but contain numerous burners, air injection ports, flames and localized behavior with dimensions that are measured in inches or fractions of inches. To create an accurate computational model of such systems requires capturing length scales within the flow field that span several orders of magnitude. In addition, to create an industrially useful model, the grid can not contain too many grid points - the model must be able to execute on an inexpensive desktop PC in a matter of days. An adaptive mesh provides a convenient means to create a grid that can capture both fine flow field detail within a very large domain with a ''reasonable'' number of grid points. However, the use of an adaptive mesh requires the development of a new flow solver. To create the new simulation tool, we have combined existing reacting CFD modeling software with new software based on emerging block structured Adaptive Mesh Refinement (AMR) technologies developed at Lawrence Berkeley National Laboratory (LBNL). Specifically, we combined: -physical models, modeling expertise, and software from existing combustion simulation codes used by Reaction Engineering International; -mesh adaption, data management, and parallelization software and technology being developed by users of the BoxLib library at LBNL; and -solution methods for problems formulated on block structured grids that were being developed in collaboration with technical staff members at the University of Utah Center for High Performance Computing (CHPC) and at LBNL. The combustion modeling software used by Reaction Engineering International represents an investment of over fifty man-years of development, conducted over a period of twenty years. Thus, it was impractical to achieve our objective by starting from scratch. The research program resulted in an adaptive grid, reacting CFD flow solver that can be used only on limited problems. In current form the code is appropriate for use on academic problems with simplified geometries. The new solver is not sufficiently robust or sufficiently general to be used in a ''production mode'' for industrial applications. The principle difficulty lies with the multi-level solver technology. The use of multi-level solvers on adaptive grids with embedded boundaries is not yet a mature field and there are many issues that remain to be resolved. From the lessons learned in this SBIR program, we have started work on a new flow solver with an AMR capability. The new code is based on a conventional cell-by-cell mesh refinement strategy used in unstructured grid solvers that employ hexahedral cells. The new solver employs several of the concepts and solution strategies developed within this research program. The formulation of the composite grid problem for the new solver has been designed to avoid the embedded boundary complications encountered in this SBIR project. This follow-on effort will result in a reacting flow CFD solver with localized mesh capability that can be used to perform engineering calculations on industrial problems in a production mode.« less
Fast-NPS-A Markov Chain Monte Carlo-based analysis tool to obtain structural information from single-molecule FRET measurements

NASA Astrophysics Data System (ADS)

Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens

2017-10-01

The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.
Extent of QTL Reuse During Repeated Phenotypic Divergence of Sympatric Threespine Stickleback.

PubMed

Conte, Gina L; Arnegard, Matthew E; Best, Jacob; Chan, Yingguang Frank; Jones, Felicity C; Kingsley, David M; Schluter, Dolph; Peichel, Catherine L

2015-11-01

How predictable is the genetic basis of phenotypic adaptation? Answering this question begins by estimating the repeatability of adaptation at the genetic level. Here, we provide a comprehensive estimate of the repeatability of the genetic basis of adaptive phenotypic evolution in a natural system. We used quantitative trait locus (QTL) mapping to discover genomic regions controlling a large number of morphological traits that have diverged in parallel between pairs of threespine stickleback (Gasterosteus aculeatus species complex) in Paxton and Priest lakes, British Columbia. We found that nearly half of QTL affected the same traits in the same direction in both species pairs. Another 40% influenced a parallel phenotypic trait in one lake but not the other. The remaining 10% of QTL had phenotypic effects in opposite directions in the two species pairs. Similarity in the proportional contributions of all QTL to parallel trait differences was about 0.4. Surprisingly, QTL reuse was unrelated to phenotypic effect size. Our results indicate that repeated use of the same genomic regions is a pervasive feature of parallel phenotypic adaptation, at least in sticklebacks. Identifying the causes of this pattern would aid prediction of the genetic basis of phenotypic evolution. Copyright © 2015 by the Genetics Society of America.
Teaching ethics to engineers: ethical decision making parallels the engineering design process.

PubMed

Bero, Bridget; Kuhlman, Alana

2011-09-01

In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.
A parallel domain decomposition-based implicit method for the Cahn–Hilliard–Cook phase-field equation in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190

2015-03-15

We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
apGA: An adaptive parallel genetic algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liepins, G.E.; Baluja, S.

1991-01-01

We develop apGA, a parallel variant of the standard generational GA, that combines aggressive search with perpetual novelty, yet is able to preserve enough genetic structure to optimally solve variably scaled, non-uniform block deceptive and hierarchical deceptive problems. apGA combines elitism, adaptive mutation, adaptive exponential scaling, and temporal memory. We present empirical results for six classes of problems, including the DeJong test suite. Although we have not investigated hybrids, we note that apGA could be incorporated into other recent GA variants such as GENITOR, CHC, and the recombination stage of mGA. 12 refs., 2 figs., 2 tabs.
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE PAGES

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...

2017-01-01

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

S-Genius, a universal software platform with versatile inverse problem resolution for scatterometry

NASA Astrophysics Data System (ADS)

Fuard, David; Troscompt, Nicolas; El Kalyoubi, Ismael; Soulan, Sébastien; Besacier, Maxime

2013-05-01

S-Genius is a new universal scatterometry platform, which gathers all the LTM-CNRS know-how regarding the rigorous electromagnetic computation and several inverse problem solver solutions. This software platform is built to be a userfriendly, light, swift, accurate, user-oriented scatterometry tool, compatible with any ellipsometric measurements to fit and any types of pattern. It aims to combine a set of inverse problem solver capabilities — via adapted Levenberg- Marquard optimization, Kriging, Neural Network solutions — that greatly improve the reliability and the velocity of the solution determination. Furthermore, as the model solution is mainly vulnerable to materials optical properties, S-Genius may be coupled with an innovative material refractive indices determination. This paper will a little bit more focuses on the modified Levenberg-Marquardt optimization, one of the indirect method solver built up in parallel with the total SGenius software coding by yours truly. This modified Levenberg-Marquardt optimization corresponds to a Newton algorithm with an adapted damping parameter regarding the definition domains of the optimized parameters. Currently, S-Genius is technically ready for scientific collaboration, python-powered, multi-platform (windows/linux/macOS), multi-core, ready for 2D- (infinite features along the direction perpendicular to the incident plane), conical, and 3D-features computation, compatible with all kinds of input data from any possible ellipsometers (angle or wavelength resolved) or reflectometers, and widely used in our laboratory for resist trimming studies, etching features characterization (such as complex stack) or nano-imprint lithography measurements for instance. The work about kriging solver, neural network solver and material refractive indices determination is done (or about to) by other LTM members and about to be integrated on S-Genius platform.
Genetic adaptations of the plateau zokor in high-elevation burrows.

PubMed

Shao, Yong; Li, Jin-Xiu; Ge, Ri-Li; Zhong, Li; Irwin, David M; Murphy, Robert W; Zhang, Ya-Ping

2015-11-25

The plateau zokor (Myospalax baileyi) spends its entire life underground in sealed burrows. Confronting limited oxygen and high carbon dioxide concentrations, and complete darkness, they epitomize a successful physiological adaptation. Here, we employ transcriptome sequencing to explore the genetic underpinnings of their adaptations to this unique habitat. Compared to Rattus norvegicus, genes belonging to GO categories related to energy metabolism (e.g. mitochondrion and fatty acid beta-oxidation) underwent accelerated evolution in the plateau zokor. Furthermore, the numbers of positively selected genes were significantly enriched in the gene categories involved in ATPase activity, blood vessel development and respiratory gaseous exchange, functional categories that are relevant to adaptation to high altitudes. Among the 787 genes with evidence of parallel evolution, and thus identified as candidate genes, several GO categories (e.g. response to hypoxia, oxygen homeostasis and erythrocyte homeostasis) are significantly enriched, are two genes, EPAS1 and AJUBA, involved in the response to hypoxia, where the parallel evolved sites are at positions that are highly conserved in sequence alignments from multiple species. Thus, accelerated evolution of GO categories, positive selection and parallel evolution at the molecular level provide evidences to parse the genetic adaptations of the plateau zokor for living in high-elevation burrows.
The dynamics of diverse segmental amplifications in populations of Saccharomyces cerevisiae adapting to strong selection.

PubMed

Payen, Celia; Di Rienzi, Sara C; Ong, Giang T; Pogachar, Jamie L; Sanchez, Joseph C; Sunshine, Anna B; Raghuraman, M K; Brewer, Bonita J; Dunham, Maitreya J

2014-03-20

Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another.
The Dynamics of Diverse Segmental Amplifications in Populations of Saccharomyces cerevisiae Adapting to Strong Selection

PubMed Central

Payen, Celia; Di Rienzi, Sara C.; Ong, Giang T.; Pogachar, Jamie L.; Sanchez, Joseph C.; Sunshine, Anna B.; Raghuraman, M. K.; Brewer, Bonita J.; Dunham, Maitreya J.

2014-01-01

Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another. PMID:24368781
Parallel trait adaptation across opposing thermal environments in experimental Drosophila melanogaster populations

PubMed Central

Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian

2015-01-01

Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. PMID:26080903
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Parallel computation using boundary elements in solid mechanics

NASA Technical Reports Server (NTRS)

Chien, L. S.; Sun, C. T.

1990-01-01

The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.
Solution of a tridiagonal system of equations on the finite element machine

NASA Technical Reports Server (NTRS)

Bostic, S. W.

1984-01-01

Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
A Parallel Cartesian Approach for External Aerodynamics of Vehicles with Complex Geometry

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.

2001-01-01

This workshop paper presents the current status in the development of a new approach for the solution of the Euler equations on Cartesian meshes with embedded boundaries in three dimensions on distributed and shared memory architectures. The approach uses adaptively refined Cartesian hexahedra to fill the computational domain. Where these cells intersect the geometry, they are cut by the boundary into arbitrarily shaped polyhedra which receive special treatment by the solver. The presentation documents a newly developed multilevel upwind solver based on a flexible domain-decomposition strategy. One novel aspect of the work is its use of space-filling curves (SFC) for memory efficient on-the-fly parallelization, dynamic re-partitioning and automatic coarse mesh generation. Within each subdomain the approach employs a variety reordering techniques so that relevant data are on the same page in memory permitting high-performance on cache-based processors. Details of the on-the-fly SFC based partitioning are presented as are construction rules for the automatic coarse mesh generation. After describing the approach, the paper uses model problems and 3- D configurations to both verify and validate the solver. The model problems demonstrate that second-order accuracy is maintained despite the presence of the irregular cut-cells in the mesh. In addition, it examines both parallel efficiency and convergence behavior. These investigations demonstrate a parallel speed-up in excess of 28 on 32 processors of an SGI Origin 2000 system and confirm that mesh partitioning has no effect on convergence behavior.
On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
A conservative approach to parallelizing the Sharks World simulation

NASA Technical Reports Server (NTRS)

Nicol, David M.; Riffe, Scott E.

1990-01-01

Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

NASA Astrophysics Data System (ADS)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Skeletal muscle mechanics, energetics and plasticity.

PubMed

Lieber, Richard L; Roberts, Thomas J; Blemker, Silvia S; Lee, Sabrina S M; Herzog, Walter

2017-10-23

The following papers by Richard Lieber (Skeletal Muscle as an Actuator), Thomas Roberts (Elastic Mechanisms and Muscle Function), Silvia Blemker (Skeletal Muscle has a Mind of its Own: a Computational Framework to Model the Complex Process of Muscle Adaptation) and Sabrina Lee (Muscle Properties of Spastic Muscle (Stroke and CP) are summaries of their representative contributions for the session on skeletal muscle mechanics, energetics and plasticity at the 2016 Biomechanics and Neural Control of Movement Conference (BANCOM 2016). Dr. Lieber revisits the topic of sarcomere length as a fundamental property of skeletal muscle contraction. Specifically, problems associated with sarcomere length non-uniformity and the role of sarcomerogenesis in diseases such as cerebral palsy are critically discussed. Dr. Roberts then makes us aware of the (often neglected) role of the passive tissues in muscles and discusses the properties of parallel elasticity and series elasticity, and their role in muscle function. Specifically, he identifies the merits of analyzing muscle deformations in three dimensions (rather than just two), because of the potential decoupling of the parallel elastic element length from the contractile element length, and reviews the associated implications for the architectural gear ratio of skeletal muscle contraction. Dr. Blemker then tackles muscle adaptation using a novel way of looking at adaptive processes and what might drive adaptation. She argues that cells do not have pre-programmed behaviors that are controlled by the nervous system. Rather, the adaptive responses of muscle fibers are determined by sub-cellular signaling pathways that are affected by mechanical and biochemical stimuli; an exciting framework with lots of potential. Finally, Dr. Lee takes on the challenging task of determining human muscle properties in vivo. She identifies the dilemma of how we can demonstrate the effectiveness of a treatment, specifically in cases of muscle spasticity following stroke or in children with cerebral palsy. She then discusses the merits of ultrasound based elastography, and the clinical possibilities this technique might hold. Overall, we are treated to a vast array of basic and clinical problems in skeletal muscle mechanics and physiology, with some solutions, and many suggestions for future research.
Durham extremely large telescope adaptive optics simulation platform.

PubMed

Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard

2007-03-01

Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Parallel and Preemptable Dynamically Dimensioned Search Algorithms for Single and Multi-objective Optimization in Water Resources

NASA Astrophysics Data System (ADS)

Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.

2015-12-01

We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Adapting an Effective Counseling Model from Patient-centered Care to Improve Motivation in Clinical Training Programs.

PubMed

Hamada, Hisayuki; Martin, Dawn; Batty, Helen P

2006-12-01

The value of establishing a patient-centered relationship within the context of the clinical encounter is well documented. The learner-centered method of medical education parallels the patient-centered clinical method; therefore, it should be explored as a method for teaching in the context of the learning encounter. In Japan and other Asian countries, rotations through services not related to the learner's chosen medical specialty are mandatory parts of the medical internship. Participation and effort in these rotations are often met with resistance from learners and are a common problem for medical educators. We adapted the counseling method for patients based on patient-centered methods such as motivational interviewing and solution-focused therapy to address this common problem. We show one case of a medical resident who lost his motivation to learn during his training. A resident has many kinds of mental and physical stress. One such problem arises from the gap between what they want to do and what they have to do. Strategies from motivational interviewing and solution-focused therapy were adapted to successfully resolve a common teaching problem in Japan. A physician teacher (preceptor) helped this resident solve the issue for himself instead of arguing in favor of change. The positive aspects of the counseling method were based on patient-centered medicine and proved useful and effective in counseling for medical residents. We may take the lessons learned from using patient-centered counseling methods to further develop a clear and systematic process of counseling methods for residents to conduct learner-centered medical education.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures

NASA Technical Reports Server (NTRS)

Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.

1998-01-01

In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Parallel evolutionary computation in bioinformatics applications.

PubMed

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Numerical Methodology for Coupled Time-Accurate Simulations of Primary and Secondary Flowpaths in Gas Turbines

NASA Technical Reports Server (NTRS)

Przekwas, A. J.; Athavale, M. M.; Hendricks, R. C.; Steinetz, B. M.

2006-01-01

Detailed information of the flow-fields in the secondary flowpaths and their interaction with the primary flows in gas turbine engines is necessary for successful designs with optimized secondary flow streams. Present work is focused on the development of a simulation methodology for coupled time-accurate solutions of the two flowpaths. The secondary flowstream is treated using SCISEAL, an unstructured adaptive Cartesian grid code developed for secondary flows and seals, while the mainpath flow is solved using TURBO, a density based code with capability of resolving rotor-stator interaction in multi-stage machines. An interface is being tested that links the two codes at the rim seal to allow data exchange between the two codes for parallel, coupled execution. A description of the coupling methodology and the current status of the interface development is presented. Representative steady-state solutions of the secondary flow in the UTRC HP Rig disc cavity are also presented.

Quantitative phase-field lattice-Boltzmann study of lamellar eutectic growth under natural convection

NASA Astrophysics Data System (ADS)

Zhang, A.; Guo, Z.; Xiong, S.-M.

2018-05-01

The influence of natural convection on lamellar eutectic growth was determined by a comprehensive phase-field lattice-Boltzmann study for Al-Cu and CB r4-C2C l6 eutectic alloys. The mass differences resulting from concentration differences led to the fluid flow and a robust parallel and adaptive mesh refinement algorithm was employed to improve the computational efficiency. By means of carefully designed "numerical experiments", the eutectic growth under natural convection was explored and a simple analytical model was proposed to predict the adjustment of the lamellar spacing. Furthermore, by alternating the solute expansion coefficient, initial lamellar spacing, and undercooling, the microstructure evolution was presented and compared with the classical eutectic growth theory. Results showed that both interfacial solute distribution and average curvature were affected by the natural convection, the effect of which could be further quantified by adding a constant into the growth rule proposed by Jackson and Hunt [Jackson and Hunt, Trans. Metall. Soc. AIME 236, 1129 (1966)].
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

NASA Technical Reports Server (NTRS)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Large-Scale Parallel Simulations of Turbulent Combustion using Combined Dimension Reduction and Tabulation of Chemistry

DTIC Science & Technology

2012-05-22

tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition, we use x2f mpi – a Fortran library...for parallel vector-valued function evaluation (used with ISAT in this context) – to efficiently redistribute the chemistry workload among the...Constrained-Equilibrium (RCCE) method, and tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Phylogeographic differentiation versus transcriptomic adaptation to warm temperatures in Zostera marina, a globally important seagrass.

PubMed

Jueterbock, A; Franssen, S U; Bergmann, N; Gu, J; Coyer, J A; Reusch, T B H; Bornberg-Bauer, E; Olsen, J L

2016-11-01

Populations distributed across a broad thermal cline are instrumental in addressing adaptation to increasing temperatures under global warming. Using a space-for-time substitution design, we tested for parallel adaptation to warm temperatures along two independent thermal clines in Zostera marina, the most widely distributed seagrass in the temperate Northern Hemisphere. A North-South pair of populations was sampled along the European and North American coasts and exposed to a simulated heatwave in a common-garden mesocosm. Transcriptomic responses under control, heat stress and recovery were recorded in 99 RNAseq libraries with ~13 000 uniquely annotated, expressed genes. We corrected for phylogenetic differentiation among populations to discriminate neutral from adaptive differentiation. The two southern populations recovered faster from heat stress and showed parallel transcriptomic differentiation, as compared with northern populations. Among 2389 differentially expressed genes, 21 exceeded neutral expectations and were likely involved in parallel adaptation to warm temperatures. However, the strongest differentiation following phylogenetic correction was between the three Atlantic populations and the Mediterranean population with 128 of 4711 differentially expressed genes exceeding neutral expectations. Although adaptation to warm temperatures is expected to reduce sensitivity to heatwaves, the continued resistance of seagrass to further anthropogenic stresses may be impaired by heat-induced downregulation of genes related to photosynthesis, pathogen defence and stress tolerance. © 2016 John Wiley & Sons Ltd.
Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Temperature compensated photovoltaic array

DOEpatents

Mosher, Dan Michael

1997-11-18

A temperature compensated photovoltaic module (20) comprised of a series of solar cells (22) having a thermally activated switch (24) connected in parallel with several of the cells (22). The photovoltaic module (20) is adapted to charge conventional batteries having a temperature coefficient (TC) differing from the temperature coefficient (TC) of the module (20). The calibration temperatures of the switches (24) are chosen whereby the colder the ambient temperature for the module (20), the more switches that are on and form a closed circuit to short the associated solar cells (22). By shorting some of the solar cells (22) as the ambient temperature decreases, the battery being charged by the module (20) is not excessively overcharged at lower temperatures. PV module (20) is an integrated solution that is reliable and inexpensive.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Unstructured Adaptive Grid Computations on an Array of SMPs

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Development of a solution adaptive unstructured scheme for quasi-3D inviscid flows through advanced turbomachinery cascades

NASA Technical Reports Server (NTRS)

Usab, William J., Jr.; Jiang, Yi-Tsann

1991-01-01

The objective of the present research is to develop a general solution adaptive scheme for the accurate prediction of inviscid quasi-three-dimensional flow in advanced compressor and turbine designs. The adaptive solution scheme combines an explicit finite-volume time-marching scheme for unstructured triangular meshes and an advancing front triangular mesh scheme with a remeshing procedure for adapting the mesh as the solution evolves. The unstructured flow solver has been tested on a series of two-dimensional airfoil configurations including a three-element analytic test case presented here. Mesh adapted quasi-three-dimensional Euler solutions are presented for three spanwise stations of the NASA rotor 67 transonic fan. Computed solutions are compared with available experimental data.
Mechanisms for Rapid Adaptive Control of Motion Processing in Macaque Visual Cortex.

PubMed

McLelland, Douglas; Baker, Pamela M; Ahmed, Bashir; Kohn, Adam; Bair, Wyeth

2015-07-15

A key feature of neural networks is their ability to rapidly adjust their function, including signal gain and temporal dynamics, in response to changes in sensory inputs. These adjustments are thought to be important for optimizing the sensitivity of the system, yet their mechanisms remain poorly understood. We studied adaptive changes in temporal integration in direction-selective cells in macaque primary visual cortex, where specific hypotheses have been proposed to account for rapid adaptation. By independently stimulating direction-specific channels, we found that the control of temporal integration of motion at one direction was independent of motion signals driven at the orthogonal direction. We also found that individual neurons can simultaneously support two different profiles of temporal integration for motion in orthogonal directions. These findings rule out a broad range of adaptive mechanisms as being key to the control of temporal integration, including untuned normalization and nonlinearities of spike generation and somatic adaptation in the recorded direction-selective cells. Such mechanisms are too broadly tuned, or occur too far downstream, to explain the channel-specific and multiplexed temporal integration that we observe in single neurons. Instead, we are compelled to conclude that parallel processing pathways are involved, and we demonstrate one such circuit using a computer model. This solution allows processing in different direction/orientation channels to be separately optimized and is sensible given that, under typical motion conditions (e.g., translation or looming), speed on the retina is a function of the orientation of image components. Many neurons in visual cortex are understood in terms of their spatial and temporal receptive fields. It is now known that the spatiotemporal integration underlying visual responses is not fixed but depends on the visual input. For example, neurons that respond selectively to motion direction integrate signals over a shorter time window when visual motion is fast and a longer window when motion is slow. We investigated the mechanisms underlying this useful adaptation by recording from neurons as they responded to stimuli moving in two different directions at different speeds. Computer simulations of our results enabled us to rule out several candidate theories in favor of a model that integrates across multiple parallel channels that operate at different time scales. Copyright © 2015 the authors 0270-6474/15/3510268-13$15.00/0.
Parallel trait adaptation across opposing thermal environments in experimental Drosophila melanogaster populations.

PubMed

Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian

2015-07-01

Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. © 2015 The Author(s). Evolution published by Wiley Periodicals, Inc. on behalf of The Society for the Study of Evolution.
Mechanisms mediating parallel action monitoring in fronto-striatal circuits.

PubMed

Beste, Christian; Ness, Vanessa; Lukas, Carsten; Hoffmann, Rainer; Stüwe, Sven; Falkenstein, Michael; Saft, Carsten

2012-08-01

Flexible response adaptation and the control of conflicting information play a pivotal role in daily life. Yet, little is known about the neuronal mechanisms mediating parallel control of these processes. We examined these mechanisms using a multi-methodological approach that integrated data from event-related potentials (ERPs) with structural MRI data and source localisation using sLORETA. Moreover, we calculated evoked wavelet oscillations. We applied this multi-methodological approach in healthy subjects and patients in a prodromal phase of a major basal ganglia disorder (i.e., Huntington's disease), to directly focus on fronto-striatal networks. Behavioural data indicated, especially the parallel execution of conflict monitoring and flexible response adaptation was modulated across the examined cohorts. When both processes do not co-incide a high integrity of fronto-striatal loops seems to be dispensable. The neurophysiological data suggests that conflict monitoring (reflected by the N2 ERP) and working memory processes (reflected by the P3 ERP) differentially contribute to this pattern of results. Flexible response adaptation under the constraint of high conflict processing affected the N2 and P3 ERP, as well as their delta frequency band oscillations. Yet, modulatory effects were strongest for the N2 ERP and evoked wavelet oscillations in this time range. The N2 ERPs were localized in the anterior cingulate cortex (BA32, BA24). Modulations of the P3 ERP were localized in parietal areas (BA7). In addition, MRI-determined caudate head volume predicted modulations in conflict monitoring, but not working memory processes. The results show how parallel conflict monitoring and flexible adaptation of action is mediated via fronto-striatal networks. While both, response monitoring and working memory processes seem to play a role, especially response selection processes and ACC-basal ganglia networks seem to be the driving force in mediating parallel conflict monitoring and flexible adaptation of actions. Copyright © 2012 Elsevier Inc. All rights reserved.
Blow-up of weak solutions to a chemotaxis system under influence of an external chemoattractant

NASA Astrophysics Data System (ADS)

Black, Tobias

2016-06-01

We study nonnnegative radially symmetric solutions of the parabolic-elliptic Keller-Segel whole space system {ut=Δu-∇ṡ(u∇v), x∈Rn,t>0,0=Δv+u+f(x), x∈Rn,t>0,u(x,0)=u0(x), x∈Rn, with prototypical external signal production f(x):={f0|x|-α,if |x|⩽R-ρ,0,if |x|⩾R+ρ, for R\\in (0,1) and ρ \\in ≤ft(0,\\frac{R}{2}\\right) , which is still integrable but not of class {{L}\\frac{n{2}+{δ0}}}≤ft({{{R}}n}\\right) for some {δ0}\\in ≤ft[0,1\\right) . For corresponding parabolic-parabolic Neumann-type boundary-value problems in bounded domains Ω , where f\\in {{L}\\frac{n{2}+{δ0}}}(Ω ){\\cap}{{C}α}(Ω ) for some {δ0}\\in (0,1) and α \\in (0,1) , it is known that the system does not emit blow-up solutions if the quantities \\parallel {{u}0}{{\\parallel}{{L\\frac{n{2}+{δ0}}}(Ω )}},\\parallel f{{\\parallel}{{L\\frac{n{2}+{δ0}}}(Ω )}} and \\parallel {{v}0}{{\\parallel}{{Lθ}(Ω )}} , for some θ >n , are all bounded by some \\varepsilon >0 small enough. We will show that whenever {{f}0}>\\frac{2n}α(n-2)(n-α ) and {{u}0}\\equiv {{c}0}>0 in \\overline{{{B}1}(0)} , a measure-valued global-in-time weak solution to the system above can be constructed which blows up immediately. Since these conditions are independent of R\\in (0,1) and c 0 > 0, we obtain a strong indication that in fact {δ0}=0 is critical for the existence of global bounded solutions under a smallness conditions as described above.
An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

PubMed

Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

2011-01-01

Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
Collisionless stellar hydrodynamics as an efficient alternative to N-body methods

NASA Astrophysics Data System (ADS)

Mitchell, Nigel L.; Vorobyov, Eduard I.; Hensler, Gerhard

2013-01-01

The dominant constituents of the Universe's matter are believed to be collisionless in nature and thus their modelling in any self-consistent simulation is extremely important. For simulations that deal only with dark matter or stellar systems, the conventional N-body technique is fast, memory efficient and relatively simple to implement. However when extending simulations to include the effects of gas physics, mesh codes are at a distinct disadvantage compared to Smooth Particle Hydrodynamics (SPH) codes. Whereas implementing the N-body approach into SPH codes is fairly trivial, the particle-mesh technique used in mesh codes to couple collisionless stars and dark matter to the gas on the mesh has a series of significant scientific and technical limitations. These include spurious entropy generation resulting from discreteness effects, poor load balancing and increased communication overhead which spoil the excellent scaling in massively parallel grid codes. In this paper we propose the use of the collisionless Boltzmann moment equations as a means to model the collisionless material as a fluid on the mesh, implementing it into the massively parallel FLASH Adaptive Mesh Refinement (AMR) code. This approach which we term `collisionless stellar hydrodynamics' enables us to do away with the particle-mesh approach and since the parallelization scheme is identical to that used for the hydrodynamics, it preserves the excellent scaling of the FLASH code already demonstrated on peta-flop machines. We find that the classic hydrodynamic equations and the Boltzmann moment equations can be reconciled under specific conditions, allowing us to generate analytic solutions for collisionless systems using conventional test problems. We confirm the validity of our approach using a suite of demanding test problems, including the use of a modified Sod shock test. By deriving the relevant eigenvalues and eigenvectors of the Boltzmann moment equations, we are able to use high order accurate characteristic tracing methods with Riemann solvers to generate numerical solutions which show excellent agreement with our analytic solutions. We conclude by demonstrating the ability of our code to model complex phenomena by simulating the evolution of a two-armed spiral galaxy whose properties agree with those predicted by the swing amplification theory.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Integration experiences and performance studies of A COTS parallel archive systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Bary

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Gary

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less

A Generalized Framework for Constrained Design Optimization of General Supersonic Configurations Using Adjoint Based Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Karman, Steve L., Jr.

2011-01-01

The Aeronautics Research Mission Directorate (ARMD) sent out an NASA Research Announcement (NRA) for proposals soliciting research and technical development. The proposed research program was aimed at addressing the desired milestones and outcomes of ROA (ROA-2006) Subtopic A.4.1.1 Advanced Computational Methods. The second milestone, SUP.1.06.02 Robust, validated mesh adaptation and error quantification for near field Computational Fluid Dynamics (CFD), was addressed by the proposed research. Additional research utilizing the direct links to geometry through a CAD interface enabled by this work will allow for geometric constraints to be applied and address the final milestone, SUP2.07.06 Constrained low-drag supersonic aerodynamic design capability. The original product of the proposed research program was an integrated system of tools that can be used for the mesh mechanics required for rapid high fidelity analysis and for design of supersonic cruise vehicles. These Euler and Navier-Stokes volume grid manipulation tools were proposed to efficiently use parallel processing. The mesh adaptation provides a systematic approach for achieving demonstrated levels of accuracy in the solutions. NASA chose to fund only the mesh generation/adaptation portion of the proposal. So this report describes the completion of the proposed tasks for mesh creation, manipulation and adaptation as it pertains to sonic boom prediction of supersonic configurations.
An adaptive semantic based mediation system for data interoperability among Health Information Systems.

PubMed

Khan, Wajahat Ali; Khattak, Asad Masood; Hussain, Maqbool; Amin, Muhammad Bilal; Afzal, Muhammad; Nugent, Christopher; Lee, Sungyoung

2014-08-01

Heterogeneity in the management of the complex medical data, obstructs the attainment of data level interoperability among Health Information Systems (HIS). This diversity is dependent on the compliance of HISs with different healthcare standards. Its solution demands a mediation system for the accurate interpretation of data in different heterogeneous formats for achieving data interoperability. We propose an adaptive AdapteR Interoperability ENgine mediation system called ARIEN, that arbitrates between HISs compliant to different healthcare standards for accurate and seamless information exchange to achieve data interoperability. ARIEN stores the semantic mapping information between different standards in the Mediation Bridge Ontology (MBO) using ontology matching techniques. These mappings are provided by our System for Parallel Heterogeneity (SPHeRe) matching system and Personalized-Detailed Clinical Model (P-DCM) approach to guarantee accuracy of mappings. The realization of the effectiveness of the mappings stored in the MBO is evaluation of the accuracy in transformation process among different standard formats. We evaluated our proposed system with the transformation process of medical records between Clinical Document Architecture (CDA) and Virtual Medical Record (vMR) standards. The transformation process achieved over 90 % of accuracy level in conversion process between CDA and vMR standards using pattern oriented approach from the MBO. The proposed mediation system improves the overall communication process between HISs. It provides an accurate and seamless medical information exchange to ensure data interoperability and timely healthcare services to patients.
Empirical Mode Decomposition and Neural Networks on FPGA for Fault Diagnosis in Induction Motors

PubMed Central

Garcia-Perez, Arturo; Osornio-Rios, Roque Alfredo; Romero-Troncoso, Rene de Jesus

2014-01-01

Nowadays, many industrial applications require online systems that combine several processing techniques in order to offer solutions to complex problems as the case of detection and classification of multiple faults in induction motors. In this work, a novel digital structure to implement the empirical mode decomposition (EMD) for processing nonstationary and nonlinear signals using the full spline-cubic function is presented; besides, it is combined with an adaptive linear network (ADALINE)-based frequency estimator and a feed forward neural network (FFNN)-based classifier to provide an intelligent methodology for the automatic diagnosis during the startup transient of motor faults such as: one and two broken rotor bars, bearing defects, and unbalance. Moreover, the overall methodology implementation into a field-programmable gate array (FPGA) allows an online and real-time operation, thanks to its parallelism and high-performance capabilities as a system-on-a-chip (SoC) solution. The detection and classification results show the effectiveness of the proposed fused techniques; besides, the high precision and minimum resource usage of the developed digital structures make them a suitable and low-cost solution for this and many other industrial applications. PMID:24678281
Empirical mode decomposition and neural networks on FPGA for fault diagnosis in induction motors.

PubMed

Camarena-Martinez, David; Valtierra-Rodriguez, Martin; Garcia-Perez, Arturo; Osornio-Rios, Roque Alfredo; Romero-Troncoso, Rene de Jesus

2014-01-01

Nowadays, many industrial applications require online systems that combine several processing techniques in order to offer solutions to complex problems as the case of detection and classification of multiple faults in induction motors. In this work, a novel digital structure to implement the empirical mode decomposition (EMD) for processing nonstationary and nonlinear signals using the full spline-cubic function is presented; besides, it is combined with an adaptive linear network (ADALINE)-based frequency estimator and a feed forward neural network (FFNN)-based classifier to provide an intelligent methodology for the automatic diagnosis during the startup transient of motor faults such as: one and two broken rotor bars, bearing defects, and unbalance. Moreover, the overall methodology implementation into a field-programmable gate array (FPGA) allows an online and real-time operation, thanks to its parallelism and high-performance capabilities as a system-on-a-chip (SoC) solution. The detection and classification results show the effectiveness of the proposed fused techniques; besides, the high precision and minimum resource usage of the developed digital structures make them a suitable and low-cost solution for this and many other industrial applications.
MARE2DEM: a 2-D inversion code for controlled-source electromagnetic and magnetotelluric data

NASA Astrophysics Data System (ADS)

Key, Kerry

2016-10-01

This work presents MARE2DEM, a freely available code for 2-D anisotropic inversion of magnetotelluric (MT) data and frequency-domain controlled-source electromagnetic (CSEM) data from onshore and offshore surveys. MARE2DEM parametrizes the inverse model using a grid of arbitrarily shaped polygons, where unstructured triangular or quadrilateral grids are typically used due to their ease of construction. Unstructured grids provide significantly more geometric flexibility and parameter efficiency than the structured rectangular grids commonly used by most other inversion codes. Transmitter and receiver components located on topographic slopes can be tilted parallel to the boundary so that the simulated electromagnetic fields accurately reproduce the real survey geometry. The forward solution is implemented with a goal-oriented adaptive finite-element method that automatically generates and refines unstructured triangular element grids that conform to the inversion parameter grid, ensuring accurate responses as the model conductivity changes. This dual-grid approach is significantly more efficient than the conventional use of a single grid for both the forward and inverse meshes since the more detailed finite-element meshes required for accurate responses do not increase the memory requirements of the inverse problem. Forward solutions are computed in parallel with a highly efficient scaling by partitioning the data into smaller independent modeling tasks consisting of subsets of the input frequencies, transmitters and receivers. Non-linear inversion is carried out with a new Occam inversion approach that requires fewer forward calls. Dense matrix operations are optimized for memory and parallel scalability using the ScaLAPACK parallel library. Free parameters can be bounded using a new non-linear transformation that leaves the transformed parameters nearly the same as the original parameters within the bounds, thereby reducing non-linear smoothing effects. Data balancing normalization weights for the joint inversion of two or more data sets encourages the inversion to fit each data type equally well. A synthetic joint inversion of marine CSEM and MT data illustrates the algorithm's performance and parallel scaling on up to 480 processing cores. CSEM inversion of data from the Middle America Trench offshore Nicaragua demonstrates a real world application. The source code and MATLAB interface tools are freely available at http://mare2dem.ucsd.edu.
Multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOEpatents

Farahmandi, C. J.; Dispennette, J. M.; Blank, E.; Kolb, A. C.

1999-05-25

A single cell, multi-electrode high performance double layer capacitor includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator sleeve is inserted over the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodes are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH[sub 3]CN). In one embodiment, the two parts of the capacitor case are conductive and function as the capacitor terminals. 32 figs.
Method of making a multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOEpatents

Farahmandi, C. Joseph; Dispennette, John M.; Blank, Edward; Kolb, Alan C.

2002-09-17

A single cell, multi-electrode high performance double layer capacitor includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator is positioned against the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodes are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH.sub.3 CN). In one embodiment, the two parts of the capacitor case are conductive and function as the capacitor terminals.
Multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOEpatents

Farahmandi, C Joseph [San Diego, CA; Dispennette, John M [Oceanside, CA; Blank, Edward [San Diego, CA; Kolb, Alan C [Rancho Santa Fe, CA

1999-05-25

A single cell, multi-electrode high performance double layer capacitor includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator sleeve is inserted over the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodes are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH.sub.3 CN). In one embodiment, the two parts of the capacitor case are conductive and function as the capacitor terminals.
Multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOEpatents

Farahmandi, C. Joseph; Dispennette, John M.; Blank, Edward; Kolb, Alan C.

1999-01-19

A single cell, multi-electrode high performance double layer capacitor includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator sleeve is inserted over the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodes are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH.sub.3 CN). In one embodiment, the two parts of the capacitor case are conductive and function as the capacitor terminals.
Multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOEpatents

Farahmandi, C.J.; Dispennette, J.M.; Blank, E.; Kolb, A.C.

1999-01-19

A single cell, multi-electrode high performance double layer capacitor includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator sleeve is inserted over the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodes are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH{sub 3}CN). In one embodiment, the two parts of the capacitor case are conductive and function as the capacitor terminals. 32 figs.
Divide-and-conquer density functional theory on hierarchical real-space grids: Parallel implementation and applications

NASA Astrophysics Data System (ADS)

Shimojo, Fuyuki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2008-02-01

A linear-scaling algorithm based on a divide-and-conquer (DC) scheme has been designed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory (DFT). Electronic wave functions are represented on a real-space grid, which is augmented with a coarse multigrid to accelerate the convergence of iterative solutions and with adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid DC-DFT algorithm on massively parallel computers. The largest benchmark tests include 11.8×106 -atom ( 1.04×1012 electronic degrees of freedom) calculation on 131 072 IBM BlueGene/L processors. The DC-DFT algorithm has well-defined parameters to control the data locality, with which the solutions converge rapidly. Also, the total energy is well conserved during the MD simulation. We perform first-principles MD simulations based on the DC-DFT algorithm, in which large system sizes bring in excellent agreement with x-ray scattering measurements for the pair-distribution function of liquid Rb and allow the description of low-frequency vibrational modes of graphene. The band gap of a CdSe nanorod calculated by the DC-DFT algorithm agrees well with the available conventional DFT results. With the DC-DFT algorithm, the band gap is calculated for larger system sizes until the result reaches the asymptotic value.
Method of making a multi-electrode double layer capacitor having single electrolyte seal and aluminum-impregnated carbon cloth electrodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farahmandi, C. Joseph; Dispennette, John M.; Blank, Edward

A method of making a double layer capacitior includes first and second flat stacks of electrodes adapted to be housed in a closeable two-part capacitor case which includes only a single electrolyte seal. Each electrode stack has a plurality of electrodes connected in parallel, with the electrodes of one stack being interleaved with the electrodes of the other stack to form an interleaved stack, and with the electrodes of each stack being electrically connected to respective capacitor terminals. A porous separator is positioned against the electrodes of one stack before interleaving to prevent electrical shorts between the electrodes. The electrodesmore » are made by folding a compressible, low resistance, aluminum-impregnated carbon cloth, made from activated carbon fibers, around a current collector foil, with a tab of the foils of each electrode of each stack being connected in parallel and connected to the respective capacitor terminal. The height of the interleaved stack is somewhat greater than the inside height of the closed capacitor case, thereby requiring compression of the interleaved electrode stack when placed inside of the case, and thereby maintaining the interleaved electrode stack under modest constant pressure. The closed capacitor case is filled with an electrolytic solution and sealed. A preferred electrolytic solution is made by dissolving an appropriate salt into acetonitrile (CH.sub.3 CN). In one embodiment, the two arts of the capacitor case are conductive and function as the capacitor terminals.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

NASA Astrophysics Data System (ADS)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

2017-01-01

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

PubMed Central

Fontaine, Michael D.

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690
A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

PubMed

Xia, Yingjie; Hu, Jia; Fontaine, Michael D

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.
A Green's function method for two-dimensional reactive solute transport in a parallel fracture-matrix system

NASA Astrophysics Data System (ADS)

Chen, Kewei; Zhan, Hongbin

2018-06-01

The reactive solute transport in a single fracture bounded by upper and lower matrixes is a classical problem that captures the dominant factors affecting transport behavior beyond pore scale. A parallel fracture-matrix system which considers the interaction among multiple paralleled fractures is an extension to a single fracture-matrix system. The existing analytical or semi-analytical solution for solute transport in a parallel fracture-matrix simplifies the problem to various degrees, such as neglecting the transverse dispersion in the fracture and/or the longitudinal diffusion in the matrix. The difficulty of solving the full two-dimensional (2-D) problem lies in the calculation of the mass exchange between the fracture and matrix. In this study, we propose an innovative Green's function approach to address the 2-D reactive solute transport in a parallel fracture-matrix system. The flux at the interface is calculated numerically. It is found that the transverse dispersion in the fracture can be safely neglected due to the small scale of fracture aperture. However, neglecting the longitudinal matrix diffusion would overestimate the concentration profile near the solute entrance face and underestimate the concentration profile at the far side. The error caused by neglecting the longitudinal matrix diffusion decreases with increasing Peclet number. The longitudinal matrix diffusion does not have obvious influence on the concentration profile in long-term. The developed model is applied to a non-aqueous-phase-liquid (DNAPL) contamination field case in New Haven Arkose of Connecticut in USA to estimate the Trichloroethylene (TCE) behavior over 40 years. The ratio of TCE mass stored in the matrix and the injected TCE mass increases above 90% in less than 10 years.
Fast Particle Methods for Multiscale Phenomena Simulations

NASA Technical Reports Server (NTRS)

Koumoutsakos, P.; Wray, A.; Shariff, K.; Pohorille, Andrew

2000-01-01

We are developing particle methods oriented at improving computational modeling capabilities of multiscale physical phenomena in : (i) high Reynolds number unsteady vortical flows, (ii) particle laden and interfacial flows, (iii)molecular dynamics studies of nanoscale droplets and studies of the structure, functions, and evolution of the earliest living cell. The unifying computational approach involves particle methods implemented in parallel computer architectures. The inherent adaptivity, robustness and efficiency of particle methods makes them a multidisciplinary computational tool capable of bridging the gap of micro-scale and continuum flow simulations. Using efficient tree data structures, multipole expansion algorithms, and improved particle-grid interpolation, particle methods allow for simulations using millions of computational elements, making possible the resolution of a wide range of length and time scales of these important physical phenomena.The current challenges in these simulations are in : [i] the proper formulation of particle methods in the molecular and continuous level for the discretization of the governing equations [ii] the resolution of the wide range of time and length scales governing the phenomena under investigation. [iii] the minimization of numerical artifacts that may interfere with the physics of the systems under consideration. [iv] the parallelization of processes such as tree traversal and grid-particle interpolations We are conducting simulations using vortex methods, molecular dynamics and smooth particle hydrodynamics, exploiting their unifying concepts such as : the solution of the N-body problem in parallel computers, highly accurate particle-particle and grid-particle interpolations, parallel FFT's and the formulation of processes such as diffusion in the context of particle methods. This approach enables us to transcend among seemingly unrelated areas of research.
Creation of parallel algorithms for the solution of problems of gas dynamics on multi-core computers and GPU

NASA Astrophysics Data System (ADS)

Rybakin, B.; Bogatencov, P.; Secrieru, G.; Iliuha, N.

2013-10-01

The paper deals with a parallel algorithm for calculations on multiprocessor computers and GPU accelerators. The calculations of shock waves interaction with low-density bubble results and the problem of the gas flow with the forces of gravity are presented. This algorithm combines a possibility to capture a high resolution of shock waves, the second-order accuracy for TVD schemes, and a possibility to observe a low-level diffusion of the advection scheme. Many complex problems of continuum mechanics are numerically solved on structured or unstructured grids. To improve the accuracy of the calculations is necessary to choose a sufficiently small grid (with a small cell size). This leads to the drawback of a substantial increase of computation time. Therefore, for the calculations of complex problems it is reasonable to use the method of Adaptive Mesh Refinement. That is, the grid refinement is performed only in the areas of interest of the structure, where, e.g., the shock waves are generated, or a complex geometry or other such features exist. Thus, the computing time is greatly reduced. In addition, the execution of the application on the resulting sequence of nested, decreasing nets can be parallelized. Proposed algorithm is based on the AMR method. Utilization of AMR method can significantly improve the resolution of the difference grid in areas of high interest, and from other side to accelerate the processes of the multi-dimensional problems calculating. Parallel algorithms of the analyzed difference models realized for the purpose of calculations on graphic processors using the CUDA technology [1].
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver

NASA Technical Reports Server (NTRS)

Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)

2002-01-01

The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
Parallel-Connected Photovoltaic Inverters: Zero Frequency Sequence Harmonic Analysis and Solution

NASA Astrophysics Data System (ADS)

Carmeli, Maria Stefania; Mauri, Marco; Frosio, Luisa; Bezzolato, Alberto; Marchegiani, Gabriele

2013-05-01

High-power photovoltaic (PV) plants are usually constituted of the connection of different PV subfields, each of them with its interface transformer. Different solutions have been studied to improve the efficiency of the whole generation system. In particular, transformerless configurations are the more attractive one from efficiency and costs point of view. This paper focuses on transformerless PV configurations characterised by the parallel connection of interface inverters. The problem of zero sequence current due to both the parallel connection and the presence of undesirable parasitic earth capacitances is considered and a solution, which consists of the synchronisation of pulse-width modulation triangular carrier, is proposed and theoretically analysed. The theoretical analysis has been validated through simulation and experimental results.

Automating FEA programming

NASA Technical Reports Server (NTRS)

Sharma, Naveen

1992-01-01

In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
Probabilistic structural mechanics research for parallel processing computers

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Martin, William R.

1991-01-01

Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical.
A massively parallel computational approach to coupled thermoelastic/porous gas flow problems

NASA Technical Reports Server (NTRS)

Shia, David; Mcmanus, Hugh L.

1995-01-01

A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.
Investigation of the effects of color on judgments of sweetness using a taste adaptation method.

PubMed

Hidaka, Souta; Shimoda, Kazumasa

2014-01-01

It has been reported that color can affect the judgment of taste. For example, a dark red color enhances the subjective intensity of sweetness. However, the underlying mechanisms of the effect of color on taste have not been fully investigated; in particular, it remains unclear whether the effect is based on cognitive/decisional or perceptual processes. Here, we investigated the effect of color on sweetness judgments using a taste adaptation method. A sweet solution whose color was subjectively congruent with sweetness was judged as sweeter than an uncolored sweet solution both before and after adaptation to an uncolored sweet solution. In contrast, subjective judgment of sweetness for uncolored sweet solutions did not differ between the conditions following adaptation to a colored sweet solution and following adaptation to an uncolored one. Color affected sweetness judgment when the target solution was colored, but the colored sweet solution did not modulate the magnitude of taste adaptation. Therefore, it is concluded that the effect of color on the judgment of taste would occur mainly in cognitive/decisional domains.
C-C1-04: Building a Health Services Information Technology Research Environment

PubMed Central

Gehrum, David W; Jones, JB; Romania, Gregory J; Young, David L; Lerch, Virginia R; Bruce, Christa A; Donkochik, Diane; Stewart, Walter F

2010-01-01

Background: The electronic health record (EHR) has opened a new era for health services research (HSR) where information technology (IT) is used to re-engineer care processes. While the EHR provides one means of advancing novel solutions, a promising strategy is to develop tools (e.g., online questionnaires, visual display tools, decision support) distinct from, but which interact with, the EHR. Development of such software tools outside the EHR offers an advantage in flexibility, sophistication, and ultimately in portability to other settings. However, institutional IT departments have an imperative to protect patient data and to standardize IT processes to ensure system-level security and support traditional business needs. Such imperatives usually present formidable process barriers to testing novel software solutions. We describe how, in collaboration with our IT department, we are creating an environment and a process that allows for routine and rapid testing of novel software solutions. Methods: We convened a working group consisting of IT and research personnel with expertise in information security, database design/management, web design, EHR programming, and health services research. The working group was tasked with developing a research IT environment to accomplish two objectives: maintain network/ data security and regulatory compliance; allow researchers working with external vendors to rapidly prototype and, in a clinical setting, test web-based tools. Results: Two parallel solutions, one focused on hardware, the second on oversight and management, were developed. First, we concluded that three separate, staged development environments were required to allow external vendor access for testing software and for transitioning software to be used in a clinic. In parallel, the extant oversight process for approving/managing access to internal/external personnel had to be altered to reflect the scope and scale of discrete research projects, as opposed to an enterpriselevel approach to IT management. Conclusions: Innovation in health services software development requires a flexible, scalable IT environment adapted to the unique objectives of a HSR software development model. In our experience, implementing the hardware solution is less challenging than the cultural change required to implement such a model and the modifications to administrative and oversight processes to sustain an environment for rapid product development and testing.
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Geopotential Error Analysis from Satellite Gradiometer and Global Positioning System Observables on Parallel Architecture

NASA Technical Reports Server (NTRS)

Schutz, Bob E.; Baker, Gregory A.

1997-01-01

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
Geopotential error analysis from satellite gradiometer and global positioning system observables on parallel architectures

NASA Astrophysics Data System (ADS)

Baker, Gregory Allen

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
One-dimensional models of quasi-neutral parallel electric fields

NASA Technical Reports Server (NTRS)

Stern, D. P.

1981-01-01

Parallel electric fields can exist in the magnetic mirror geometry of auroral field lines if they conform to the quasineutral equilibrium solutions. Results on quasi-neutral equilibria and on double layer discontinuities were reviewed and the effects on such equilibria due to non-unique solutions, potential barriers and field aligned current flows using as inputs monoenergetic isotropic distribution functions were examined.
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Automated three-component synthesis of a library of γ-lactams

PubMed Central

Fenster, Erik; Hill, David; Reiser, Oliver

2012-01-01

Summary A three-component method for the synthesis of γ-lactams from commercially available maleimides, aldehydes, and amines was adapted to parallel library synthesis. Improvements to the chemistry over previous efforts include the optimization of the method to a one-pot process, the management of by-products and excess reagents, the development of an automated parallel sequence, and the adaption of the method to permit the preparation of enantiomerically enriched products. These efforts culminated in the preparation of a library of 169 γ-lactams. PMID:23209515
LMC: Logarithmantic Monte Carlo

NASA Astrophysics Data System (ADS)

Mantz, Adam B.

2017-06-01

LMC is a Markov Chain Monte Carlo engine in Python that implements adaptive Metropolis-Hastings and slice sampling, as well as the affine-invariant method of Goodman & Weare, in a flexible framework. It can be used for simple problems, but the main use case is problems where expensive likelihood evaluations are provided by less flexible third-party software, which benefit from parallelization across many nodes at the sampling level. The parallel/adaptive methods use communication through MPI, or alternatively by writing/reading files, and mostly follow the approaches pioneered by CosmoMC (ascl:1106.025).
Unstructured grids on SIMD torus machines

NASA Technical Reports Server (NTRS)

Bjorstad, Petter E.; Schreiber, Robert

1994-01-01

Unstructured grids lead to unstructured communication on distributed memory parallel computers, a problem that has been considered difficult. Here, we consider adaptive, offline communication routing for a SIMD processor grid. Our approach is empirical. We use large data sets drawn from supercomputing applications instead of an analytic model of communication load. The chief contribution of this paper is an experimental demonstration of the effectiveness of certain routing heuristics. Our routing algorithm is adaptive, nonminimal, and is generally designed to exploit locality. We have a parallel implementation of the router, and we report on its performance.
Independent Axes of Genetic Variation and Parallel Evolutionary Divergence Of Opercle Bone Shape in Threespine Stickleback

PubMed Central

Kimmel, Charles B.; Cresko, William A.; Phillips, Patrick C.; Ullmann, Bonnie; Currey, Mark; von Hippel, Frank; Kristjánsson, Bjarni K.; Gelmond, Ofer; McGuigan, Katrina

2014-01-01

Evolution of similar phenotypes in independent populations is often taken as evidence of adaptation to the same fitness optimum. However, the genetic architecture of traits might cause evolution to proceed more often toward particular phenotypes, and less often toward others, independently of the adaptive value of the traits. Freshwater populations of Alaskan threespine stickleback have repeatedly evolved the same distinctive opercle shape after divergence from an oceanic ancestor. Here we demonstrate that this pattern of parallel evolution is widespread, distinguishing oceanic and freshwater populations across the Pacific Coast of North America and Iceland. We test whether this parallel evolution reflects genetic bias by estimating the additive genetic variance– covariance matrix (G) of opercle shape in an Alaskan oceanic (putative ancestral) population. We find significant additive genetic variance for opercle shape and that G has the potential to be biasing, because of the existence of regions of phenotypic space with low additive genetic variation. However, evolution did not occur along major eigenvectors of G, rather it occurred repeatedly in the same directions of high evolvability. We conclude that the parallel opercle evolution is most likely due to selection during adaptation to freshwater habitats, rather than due to biasing effects of opercle genetic architecture. PMID:22276538
MAG3D and its application to internal flowfield analysis

NASA Technical Reports Server (NTRS)

Lee, K. D.; Henderson, T. L.; Choo, Y. K.

1992-01-01

MAG3D (multiblock adaptive grid, 3D) is a 3D solution-adaptive grid generation code which redistributes grid points to improve the accuracy of a flow solution without increasing the number of grid points. The code is applicable to structured grids with a multiblock topology. It is independent of the original grid generator and the flow solver. The code uses the coordinates of an initial grid and the flow solution interpolated onto the new grid. MAG3D uses a numerical mapping and potential theory to modify the grid distribution based on properties of the flow solution on the initial grid. The adaptation technique is discussed, and the capability of MAG3D is demonstrated with several internal flow examples. Advantages of using solution-adaptive grids are also shown by comparing flow solutions on adaptive grids with those on initial grids.
The block adaptive multigrid method applied to the solution of the Euler equations

NASA Technical Reports Server (NTRS)

Pantelelis, Nikos

1993-01-01

In the present study, a scheme capable of solving very fast and robust complex nonlinear systems of equations is presented. The Block Adaptive Multigrid (BAM) solution method offers multigrid acceleration and adaptive grid refinement based on the prediction of the solution error. The proposed solution method was used with an implicit upwind Euler solver for the solution of complex transonic flows around airfoils. Very fast results were obtained (18-fold acceleration of the solution) using one fourth of the volumes of a global grid with the same solution accuracy for two test cases.
Solving delay differential equations in S-ADAPT by method of steps.

PubMed

Bauer, Robert J; Mo, Gary; Krzyzanski, Wojciech

2013-09-01

S-ADAPT is a version of the ADAPT program that contains additional simulation and optimization abilities such as parametric population analysis. S-ADAPT utilizes LSODA to solve ordinary differential equations (ODEs), an algorithm designed for large dimension non-stiff and stiff problems. However, S-ADAPT does not have a solver for delay differential equations (DDEs). Our objective was to implement in S-ADAPT a DDE solver using the methods of steps. The method of steps allows one to solve virtually any DDE system by transforming it to an ODE system. The solver was validated for scalar linear DDEs with one delay and bolus and infusion inputs for which explicit analytic solutions were derived. Solutions of nonlinear DDE problems coded in S-ADAPT were validated by comparing them with ones obtained by the MATLAB DDE solver dde23. The estimation of parameters was tested on the MATLB simulated population pharmacodynamics data. The comparison of S-ADAPT generated solutions for DDE problems with the explicit solutions as well as MATLAB produced solutions which agreed to at least 7 significant digits. The population parameter estimates from using importance sampling expectation-maximization in S-ADAPT agreed with ones used to generate the data. Published by Elsevier Ireland Ltd.
An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints

PubMed Central

Rao, Yunqing; Qi, Dezhong; Li, Jinling

2013-01-01

For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem. PMID:24489491
A novel environmental chamber for neuronal network multisite recordings.

PubMed

Biffi, E; Regalia, G; Ghezzi, D; De Ceglia, R; Menegon, A; Ferrigno, G; Fiore, G B; Pedrocchi, A

2012-10-01

Environmental stability is a critical issue for neuronal networks in vitro. Hence, the ability to control the physical and chemical environment of cell cultures during electrophysiological measurements is an important requirement in the experimental design. In this work, we describe the development and the experimental verification of a closed chamber for multisite electrophysiology and optical monitoring. The chamber provides stable temperature, pH and humidity and guarantees cell viability comparable to standard incubators. Besides, it integrates the electronics for long-term neuronal activity recording. The system is portable and adaptable for multiple network housings, which allows performing parallel experiments in the same environment. Our results show that this device can be a solution for long-term electrophysiology, for dual network experiments and for coupled optical and electrical measurements. Copyright © 2012 Wiley Periodicals, Inc.
SIERRA Low Mach Module: Fuego User Manual Version 4.46.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

2017-09-01

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less

SIERRA Low Mach Module: Fuego Theory Manual Version 4.44

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

2017-04-01

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
SIERRA Low Mach Module: Fuego Theory Manual Version 4.46.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
An improved hierarchical genetic algorithm for sheet cutting scheduling with process constraints.

PubMed

Rao, Yunqing; Qi, Dezhong; Li, Jinling

2013-01-01

For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony--hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the World- wide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE PAGES

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; ...

2016-09-29

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Carpet: Adaptive Mesh Refinement for the Cactus Framework

NASA Astrophysics Data System (ADS)

Schnetter, Erik; Hawley, Scott; Hawke, Ian

2016-11-01

Carpet is an adaptive mesh refinement and multi-patch driver for the Cactus Framework (ascl:1102.013). Cactus is a software framework for solving time-dependent partial differential equations on block-structured grids, and Carpet acts as driver layer providing adaptive mesh refinement, multi-patch capability, as well as parallelization and efficient I/O.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Development of a web-based tool for automated processing and cataloging of a unique combinatorial drug screen.

PubMed

Dalecki, Alex G; Wolschendorf, Frank

2016-07-01

Facing totally resistant bacteria, traditional drug discovery efforts have proven to be of limited use in replenishing our depleted arsenal of therapeutic antibiotics. Recently, the natural anti-bacterial properties of metal ions in synergy with metal-coordinating ligands have shown potential for generating new molecule candidates with potential therapeutic downstream applications. We recently developed a novel combinatorial screening approach to identify compounds with copper-dependent anti-bacterial properties. Through a parallel screening technique, the assay distinguishes between copper-dependent and independent activities against Mycobacterium tuberculosis with hits being defined as compounds with copper-dependent activities. These activities must then be linked to a compound master list to process and analyze the data and to identify the hit molecules, a labor intensive and mistake-prone analysis. Here, we describe a software program built to automate this analysis in order to streamline our workflow significantly. We conducted a small, 1440 compound screen against M. tuberculosis and used it as an example framework to build and optimize the software. Though specifically adapted to our own needs, it can be readily expanded for any small- to medium-throughput screening effort, parallel or conventional. Further, by virtue of the underlying Linux server, it can be easily adapted for chemoinformatic analysis of screens through packages such as OpenBabel. Overall, this setup represents an easy-to-use solution for streamlining processing and analysis of biological screening data, as well as offering a scaffold for ready functionality expansion. Copyright © 2016 Elsevier B.V. All rights reserved.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Adaptive Nulling for the Terrestrial Planet Finder Interferometer

NASA Technical Reports Server (NTRS)

Peters, Robert D.; Lay, Oliver P.; Jeganathan, Muthu; Hirai, Akiko

2006-01-01

A description of adaptive nulling for Terrestrial Planet Finder Interferometer (TPFI) is presented. The topics include: 1) Nulling in TPF-I; 2) Why Do Adaptive Nulling; 3) Parallel High-Order Compensator Design; 4) Phase and Amplitude Control; 5) Development Activates; 6) Requirements; 7) Simplified Experimental Setup; 8) Intensity Correction; and 9) Intensity Dispersion Stability. A short summary is also given on adaptive nulling for the TPFI.
Contrast adaptation in cat visual cortex is not mediated by GABA.

PubMed

DeBruyn, E J; Bonds, A B

1986-09-24

The possible involvement of gamma-aminobutyric acid (GABA) in contrast adaptation in single cells in area 17 of the cat was investigated. Iontophoretic application of N-methyl bicuculline increased cell responses, but had no effect on the magnitude of adaptation. These results suggest that contrast adaptation is the result of inhibition through a parallel pathway, but that GABA does not mediate this process.
Phase reconstruction using compressive two-step parallel phase-shifting digital holography

NASA Astrophysics Data System (ADS)

Ramachandran, Prakash; Alex, Zachariah C.; Nelleri, Anith

2018-04-01

The linear relationship between the sample complex object wave and its approximated complex Fresnel field obtained using single shot parallel phase-shifting digital holograms (PPSDH) is used in compressive sensing framework and an accurate phase reconstruction is demonstrated. It is shown that the accuracy of phase reconstruction of this method is better than that of compressive sensing adapted single exposure inline holography (SEOL) method. It is derived that the measurement model of PPSDH method retains both the real and imaginary parts of the Fresnel field but with an approximation noise and the measurement model of SEOL retains only the real part exactly equal to the real part of the complex Fresnel field and its imaginary part is completely not available. Numerical simulation is performed for CS adapted PPSDH and CS adapted SEOL and it is demonstrated that the phase reconstruction is accurate for CS adapted PPSDH and can be used for single shot digital holographic reconstruction.
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)

1998-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

NASA Technical Reports Server (NTRS)

Stone, H. S.

1971-01-01

Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Multigrid solution of internal flows using unstructured solution adaptive meshes

NASA Technical Reports Server (NTRS)

Smith, Wayne A.; Blake, Kenneth R.

1992-01-01

This is the final report of the NASA Lewis SBIR Phase 2 Contract Number NAS3-25785, Multigrid Solution of Internal Flows Using Unstructured Solution Adaptive Meshes. The objective of this project, as described in the Statement of Work, is to develop and deliver to NASA a general three-dimensional Navier-Stokes code using unstructured solution-adaptive meshes for accuracy and multigrid techniques for convergence acceleration. The code will primarily be applied, but not necessarily limited, to high speed internal flows in turbomachinery.
Solutions of large-scale electromagnetics problems involving dielectric objects with the parallel multilevel fast multipole algorithm.

PubMed

Ergül, Özgür

2011-11-01

Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Efficient parallel implementations of QM/MM-REMD (quantum mechanical/molecular mechanics-replica-exchange MD) and umbrella sampling: isomerization of H2O2 in aqueous solution.

PubMed

Fedorov, Dmitri G; Sugita, Yuji; Choi, Cheol Ho

2013-07-03

An efficient parallel implementation of QM/MM-based replica-exchange molecular dynamics (REMD) as well as umbrella samplings techniques was proposed by adopting the generalized distributed data interface (GDDI). Parallelization speed-up of 40.5 on 48 cores was achieved, making our QM/MM-MD engine a robust tool for studying complex chemical dynamics in solution. They were comparatively used to study the torsional isomerization of hydrogen peroxide in aqueous solution. All results by QM/MM-REMD and QM/MM umbrella sampling techniques yielded nearly identical potentials of mean force (PMFs) regardless of the particular QM theories for solute, showing that the overall dynamics are mainly determined by solvation. Although the entropic penalty of solvent rearrangements exists in cisoid conformers, it was found that both strong intermolecular hydrogen bonding and dipole-dipole interactions preferentially stabilize them in solution, reducing the torsional free-energy barrier at 0° by about 3 kcal/mol as compared to that in gas phase.
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

Parallel Guessing: A Strategy for High-Speed Computation

DTIC Science & Technology

1984-09-19

for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Kinematics and dynamics of robotic systems with multiple closed loops

NASA Astrophysics Data System (ADS)

Zhang, Chang-De

The kinematics and dynamics of robotic systems with multiple closed loops, such as Stewart platforms, walking machines, and hybrid manipulators, are studied. In the study of kinematics, focus is on the closed-form solutions of the forward position analysis of different parallel systems. A closed-form solution means that the solution is expressed as a polynomial in one variable. If the order of the polynomial is less than or equal to four, the solution has analytical closed-form. First, the conditions of obtaining analytical closed-form solutions are studied. For a Stewart platform, the condition is found to be that one rotational degree of freedom of the output link is decoupled from the other five. Based on this condition, a class of Stewart platforms which has analytical closed-form solution is formulated. Conditions of analytical closed-form solution for other parallel systems are also studied. Closed-form solutions of forward kinematics for walking machines and multi-fingered grippers are then studied. For a parallel system with three three-degree-of-freedom subchains, there are 84 possible ways to select six independent joints among nine joints. These 84 ways can be classified into three categories: Category 3:3:0, Category 3:2:1, and Category 2:2:2. It is shown that the first category has no solutions; the solutions of the second category have analytical closed-form; and the solutions of the last category are higher order polynomials. The study is then extended to a nearly general Stewart platform. The solution is a 20th order polynomial and the Stewart platform has a maximum of 40 possible configurations. Also, the study is extended to a new class of hybrid manipulators which consists of two serially connected parallel mechanisms. In the study of dynamics, a computationally efficient method for inverse dynamics of manipulators based on the virtual work principle is developed. Although this method is comparable with the recursive Newton-Euler method for serial manipulators, its advantage is more noteworthy when applied to parallel systems. An approach of inverse dynamics of a walking machine is also developed, which includes inverse dynamic modeling, foot force distribution, and joint force/torque allocation.

A Two-dimensional Version of the Niblett-Bostick Transformation for Magnetotelluric Interpretations

NASA Astrophysics Data System (ADS)

Esparza, F.

2005-05-01

An imaging technique for two-dimensional magnetotelluric interpretations is developed following the well known Niblett-Bostick transformation for one-dimensional profiles. The algorithm uses a Hopfield artificial neural network to process series and parallel magnetotelluric impedances along with their analytical influence functions. The adaptive, weighted average approximation preserves part of the nonlinearity of the original problem. No initial model in the usual sense is required for the recovery of a functional model. Rather, the built-in relationship between model and data considers automatically, all at the same time, many half spaces whose electrical conductivities vary according to the data. The use of series and parallel impedances, a self-contained pair of invariants of the impedance tensor, avoids the need to decide on best angles of rotation for TE and TM separations. Field data from a given profile can thus be fed directly into the algorithm without much processing. The solutions offered by the Hopfield neural network correspond to spatial averages computed through rectangular windows that can be chosen at will. Applications of the algorithm to simple synthetic models and to the COPROD2 data set illustrate the performance of the approximation.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Development of an Unstructured Mesh Code for Flows About Complete Vehicles

NASA Technical Reports Server (NTRS)

Peraire, Jaime; Gupta, K. K. (Technical Monitor)

2001-01-01

This report describes the research work undertaken at the Massachusetts Institute of Technology, under NASA Research Grant NAG4-157. The aim of this research is to identify effective algorithms and methodologies for the efficient and routine solution of flow simulations about complete vehicle configurations. For over ten years we have received support from NASA to develop unstructured mesh methods for Computational Fluid Dynamics. As a result of this effort a methodology based on the use of unstructured adapted meshes of tetrahedra and finite volume flow solvers has been developed. A number of gridding algorithms, flow solvers, and adaptive strategies have been proposed. The most successful algorithms developed from the basis of the unstructured mesh system FELISA. The FELISA system has been extensively for the analysis of transonic and hypersonic flows about complete vehicle configurations. The system is highly automatic and allows for the routine aerodynamic analysis of complex configurations starting from CAD data. The code has been parallelized and utilizes efficient solution algorithms. For hypersonic flows, a version of the code which incorporates real gas effects, has been produced. The FELISA system is also a component of the STARS aeroservoelastic system developed at NASA Dryden. One of the latest developments before the start of this grant was to extend the system to include viscous effects. This required the development of viscous generators, capable of generating the anisotropic grids required to represent boundary layers, and viscous flow solvers. We show some sample hypersonic viscous computations using the developed viscous generators and solvers. Although this initial results were encouraging it became apparent that in order to develop a fully functional capability for viscous flows, several advances in solution accuracy, robustness and efficiency were required. In this grant we set out to investigate some novel methodologies that could lead to the required improvements. In particular we focused on two fronts: (1) finite element methods and (2) iterative algebraic multigrid solution techniques.
Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

NASA Astrophysics Data System (ADS)

Gassmöller, Rene; Bangerth, Wolfgang

2016-04-01

Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
A Domain-Decomposed Multilevel Method for Adaptively Refined Cartesian Grids with Embedded Boundaries

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.

2000-01-01

Preliminary verification and validation of an efficient Euler solver for adaptively refined Cartesian meshes with embedded boundaries is presented. The parallel, multilevel method makes use of a new on-the-fly parallel domain decomposition strategy based upon the use of space-filling curves, and automatically generates a sequence of coarse meshes for processing by the multigrid smoother. The coarse mesh generation algorithm produces grids which completely cover the computational domain at every level in the mesh hierarchy. A series of examples on realistically complex three-dimensional configurations demonstrate that this new coarsening algorithm reliably achieves mesh coarsening ratios in excess of 7 on adaptively refined meshes. Numerical investigations of the scheme's local truncation error demonstrate an achieved order of accuracy between 1.82 and 1.88. Convergence results for the multigrid scheme are presented for both subsonic and transonic test cases and demonstrate W-cycle multigrid convergence rates between 0.84 and 0.94. Preliminary parallel scalability tests on both simple wing and complex complete aircraft geometries shows a computational speedup of 52 on 64 processors using the run-time mesh partitioner.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

NASA Astrophysics Data System (ADS)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

NASA Technical Reports Server (NTRS)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Parallel Monotonic Basin Hopping for Low Thrust Trajectory Optimization

NASA Technical Reports Server (NTRS)

McCarty, Steven L.; McGuire, Melissa L.

2018-01-01

Monotonic Basin Hopping has been shown to be an effective method of solving low thrust trajectory optimization problems. This paper outlines an extension to the common serial implementation by parallelizing it over any number of available compute cores. The Parallel Monotonic Basin Hopping algorithm described herein is shown to be an effective way to more quickly locate feasible solutions, and improve locally optimal solutions in an automated way without requiring a feasible initial guess. The increased speed achieved through parallelization enables the algorithm to be applied to more complex problems that would otherwise be impractical for a serial implementation. Low thrust cislunar transfers and a hybrid Mars example case demonstrate the effectiveness of the algorithm. Finally, a preliminary scaling study quantifies the expected decrease in solve time compared to a serial implementation.,
On the maintenance of genetic variation and adaptation to environmental change: considerations from population genomics in fishes.

PubMed

Bernatchez, L

2016-12-01

The first goal of this paper was to overview modern approaches to local adaptation, with a focus on the use of population genomics data to detect signals of natural selection in fishes. Several mechanisms are discussed that may enhance the maintenance of genetic variation and evolutionary potential, which have been overlooked and should be considered in future theoretical development and predictive models: the prevalence of soft sweeps, polygenic basis of adaptation, balancing selection and transient polymorphisms, parallel evolution, as well as epigenetic variation. Research on fish population genomics has provided ample evidence for local adaptation at the genome level. Pervasive adaptive evolution, however, seems to almost never involve the fixation of beneficial alleles. Instead, adaptation apparently proceeds most commonly by soft sweeps entailing shifts in frequencies of alleles being shared between differentially adapted populations. One obvious factor contributing to the maintenance of standing genetic variation in the face of selective pressures is that adaptive phenotypic traits are most often highly polygenic, and consequently the response to selection should derive mostly from allelic co-variances among causative loci rather than pronounced allele frequency changes. Balancing selection in its various forms may also play an important role in maintaining adaptive genetic variation and the evolutionary potential of species to cope with environmental change. A large body of literature on fishes also shows that repeated evolution of adaptive phenotypes is a ubiquitous evolutionary phenomenon that seems to occur most often via different genetic solutions, further adding to the potential options of species to cope with a changing environment. Moreover, a paradox is emerging from recent fish studies whereby populations of highly reduced effective population sizes and impoverished genetic diversity can apparently retain their adaptive potential in some circumstances. Although more empirical support is needed, several recent studies suggest that epigenetic variation could account for this apparent paradox. Therefore, epigenetic variation should be fully integrated with considerations pertaining to role of soft sweeps, polygenic and balancing selection, as well as repeated adaptation involving different genetic basis towards improving models predicting the evolutionary potential of species to cope with a changing world. © 2016 The Fisheries Society of the British Isles.
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model

NASA Astrophysics Data System (ADS)

Kumar, M.; Duffy, C.

2006-05-01

Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
Adaptive Environment for Supercompiling with Optimized Parallelism (AESOP)

DTIC Science & Technology

2011-09-01

DATES COVERED (From - To) September 2011 Final 09 March 2009 – 31 July 2011 4 . TITLE AND SUBTITLE ADAPTIVE ENVIRONMENT FOR SUPERCOMPILING WITH... 4 2.1 System characterization loop...Integration Points for AESOP .......................................................................................10 4 . LLVM and the AESOP Compiler
Similarity solutions of time-dependent relativistic radiation-hydrodynamical plane-parallel flows

NASA Astrophysics Data System (ADS)

Fukue, Jun

2018-04-01

Similarity solutions are examined for the frequency-integrated relativistic radiation-hydrodynamical flows, which are described by the comoving quantities. The flows are vertical plane-parallel time-dependent ones with a gray opacity coefficient. For adequate boundary conditions, the flows are accelerated in a somewhat homologous manner, but terminate at some singular locus, which originates from the pathological behavior in relativistic radiation moment equations truncated in finite orders.
Similarity solutions of time-dependent relativistic radiation-hydrodynamical plane-parallel flows

NASA Astrophysics Data System (ADS)

Fukue, Jun

2018-06-01

Similarity solutions are examined for the frequency-integrated relativistic radiation-hydrodynamical flows, which are described by the comoving quantities. The flows are vertical plane-parallel time-dependent ones with a gray opacity coefficient. For adequate boundary conditions, the flows are accelerated in a somewhat homologous manner, but terminate at some singular locus, which originates from the pathological behavior in relativistic radiation moment equations truncated in finite orders.
Intelligent flight control systems

NASA Technical Reports Server (NTRS)

Stengel, Robert F.

1993-01-01

The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms.
Similar traits, different genes? Examining convergent evolution in related weedy rice populations

USDA-ARS?s Scientific Manuscript database

Convergent phenotypic evolution may or may not be associated with parallel genotypic evolution. Agricultural weeds have repeatedly been selected for weed-adaptive traits such as rapid growth, increased seed dispersal and dormancy, thus providing an ideal system for the study of parallel evolution. H...
New machining method of high precision infrared window part

NASA Astrophysics Data System (ADS)

Yang, Haicheng; Su, Ying; Xu, Zengqi; Guo, Rui; Li, Wenting; Zhang, Feng; Liu, Xuanmin

2016-10-01

Most of the spherical shell of the photoelectric multifunctional instrument was designed as multi optical channel mode to adapt to the different band of the sensor, there were mainly TV, laser and infrared channels. Without affecting the optical diameter, wind resistance and pneumatic performance of the optical system, the overall layout of the spherical shell was optimized to save space and reduce weight. Most of the shape of the optical windows were special-shaped, each optical window directly participated in the high resolution imaging of the corresponding sensor system, and the optical axis parallelism of each sensor needed to meet the accuracy requirement of 0.05mrad.Therefore precision machining of optical window parts quality will directly affect the photoelectric system's pointing accuracy and interchangeability. Processing and testing of the TV and laser window had been very mature, while because of the special nature of the material, transparent and high refractive rate, infrared window parts had the problems of imaging quality and the control of the minimum focal length and second level parallel in the processing. Based on years of practical experience, this paper was focused on how to control the shape and parallel difference precision of infrared window parts in the processing. Single pass rate was increased from 40% to more than 95%, the processing efficiency was significantly enhanced, an effective solution to the bottleneck problem in the actual processing, which effectively solve the bottlenecks in research and production.
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moryakov, A. V., E-mail: sailor@orc.ru

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Micropore extrusion-induced alignment transition from perpendicular to parallel of cylindrical domains in block copolymers.

PubMed

Qu, Ting; Zhao, Yongbin; Li, Zongbo; Wang, Pingping; Cao, Shubo; Xu, Yawei; Li, Yayuan; Chen, Aihua

2016-02-14

The orientation transition from perpendicular to parallel alignment of PEO cylindrical domains of PEO-b-PMA(Az) films has been demonstrated by extruding the block copolymer (BCP) solutions through a micropore of a plastic gastight syringe. The parallelized orientation of PEO domains induced by this micropore extrusion can be recovered to perpendicular alignment via ultrasonication of the extruded BCP solutions and subsequent annealing. A plausible mechanism is proposed in this study. The BCP films can be used as templates to prepare nanowire arrays with controlled layers, which has enormous potential application in the field of integrated circuits.

Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

PubMed

Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

2015-01-01

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
A Review of Lightweight Thread Approaches for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castello, Adrian; Pena, Antonio J.; Seo, Sangmin

High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Accelerating large-scale protein structure alignments with graphics processing units

PubMed Central

2012-01-01

Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Series and parallel arc-fault circuit interrupter tests.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Jay Dean; Fresquez, Armando J.; Gudgel, Bob

2013-07-01

While the 2011 National Electrical Codeª (NEC) only requires series arc-fault protection, some arc-fault circuit interrupter (AFCI) manufacturers are designing products to detect and mitigate both series and parallel arc-faults. Sandia National Laboratories (SNL) has extensively investigated the electrical differences of series and parallel arc-faults and has offered possible classification and mitigation solutions. As part of this effort, Sandia National Laboratories has collaborated with MidNite Solar to create and test a 24-string combiner box with an AFCI which detects, differentiates, and de-energizes series and parallel arc-faults. In the case of the MidNite AFCI prototype, series arc-faults are mitigated by openingmore » the PV strings, whereas parallel arc-faults are mitigated by shorting the array. A range of different experimental series and parallel arc-fault tests with the MidNite combiner box were performed at the Distributed Energy Technologies Laboratory (DETL) at SNL in Albuquerque, NM. In all the tests, the prototype de-energized the arc-faults in the time period required by the arc-fault circuit interrupt testing standard, UL 1699B. The experimental tests confirm series and parallel arc-faults can be successfully mitigated with a combiner box-integrated solution.« less
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
A Proposed Solution to the Problem with Using Completely Random Data to Assess the Number of Factors with Parallel Analysis

ERIC Educational Resources Information Center

Green, Samuel B.; Levy, Roy; Thompson, Marilyn S.; Lu, Min; Lo, Wen-Juo

2012-01-01

A number of psychometricians have argued for the use of parallel analysis to determine the number of factors. However, parallel analysis must be viewed at best as a heuristic approach rather than a mathematically rigorous one. The authors suggest a revision to parallel analysis that could improve its accuracy. A Monte Carlo study is conducted to…
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Simplified Production of Organic Compounds Containing High Enantiomer Excesses

NASA Technical Reports Server (NTRS)

Cooper, George W. (Inventor)

2015-01-01

The present invention is directed to a method for making an enantiomeric organic compound having a high amount of enantiomer excesses including the steps of a) providing an aqueous solution including an initial reactant and a catalyst; and b) subjecting said aqueous solution simultaneously to a magnetic field and photolysis radiation such that said photolysis radiation produces light rays that run substantially parallel or anti-parallel to the magnetic field passing through said aqueous solution, wherein said catalyst reacts with said initial reactant to form the enantiomeric organic compound having a high amount of enantiomer excesses.
Unstructured Mesh Methods for the Simulation of Hypersonic Flows

NASA Technical Reports Server (NTRS)

Peraire, Jaime; Bibb, K. L. (Technical Monitor)

2001-01-01

This report describes the research work undertaken at the Massachusetts Institute of Technology. The aim of this research is to identify effective algorithms and methodologies for the efficient and routine solution of hypersonic viscous flows about re-entry vehicles. For over ten years we have received support from NASA to develop unstructured mesh methods for Computational Fluid Dynamics. As a result of this effort a methodology based on the use, of unstructured adapted meshes of tetrahedra and finite volume flow solvers has been developed. A number of gridding algorithms flow solvers, and adaptive strategies have been proposed. The most successful algorithms developed from the basis of the unstructured mesh system FELISA. The FELISA system has been extensively for the analysis of transonic and hypersonic flows about complete vehicle configurations. The system is highly automatic and allows for the routine aerodynamic analysis of complex configurations starting from CAD data. The code has been parallelized and utilizes efficient solution algorithms. For hypersonic flows, a version of the, code which incorporates real gas effects, has been produced. One of the latest developments before the start of this grant was to extend the system to include viscous effects. This required the development of viscous generators, capable of generating the anisotropic grids required to represent boundary layers, and viscous flow solvers. In figures I and 2, we show some sample hypersonic viscous computations using the developed viscous generators and solvers. Although these initial results were encouraging, it became apparent that in order to develop a fully functional capability for viscous flows, several advances in gridding, solution accuracy, robustness and efficiency were required. As part of this research we have developed: 1) automatic meshing techniques and the corresponding computer codes have been delivered to NASA and implemented into the GridEx system, 2) a finite element algorithm for the solution of the viscous compressible flow equations which can solve flows all the way down to the incompressible limit and that can use higher order (quadratic) approximations leading to highly accurate answers, and 3) and iterative algebraic multigrid solution techniques.
Towards a large-scale scalable adaptive heart model using shallow tree meshes

NASA Astrophysics Data System (ADS)

Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf

2015-10-01

Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology.

PubMed

Sachetto Oliveira, Rafael; Martins Rocha, Bernardo; Burgarelli, Denise; Meira, Wagner; Constantinides, Christakis; Weber Dos Santos, Rodrigo

2018-02-01

The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy. Copyright © 2017 John Wiley & Sons, Ltd.
The Role of Instability Waves in Predicting Jet Noise

NASA Technical Reports Server (NTRS)

Goldstein, M. E.; Leib, S. J.

2004-01-01

There has been an ongoing debate about the role of linear instability waves in the prediction of jet noise. Parallel mean flow models, such as the one proposed by Lilley, usually neglect these waves because they cause the solution to become infinite. The resulting solution is then non-causal and can, therefore, be quite different from the true causal solution for the chaotic flows being considered here. The present paper solves the relevant acoustic equations for a non-parallel mean flow by using a vector Green s function approach and assuming the mean flow to be weakly non-parallel, i.e., assuming the spread rate to be small. It demonstrates that linear instability waves must be accounted for in order to construct a proper causal solution to the jet noise problem. . Recent experimental results (e.g., see Tam, Golebiowski, and Seiner,1996) show that the small angle spectra radiated by supersonic jets are quite different from those radiated at larger angles (say, at 90deg) and even exhibit dissimilar frequency scalings (i.e., they scale with Helmholtz number as opposed to Strouhal number). The present solution is (among other things )able to explain this rather puzzling experimental result.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Solution-phase parallel synthesis of aryloxyimino amides via a novel multicomponent reaction among aromatic (Z)-chlorooximes, isocyanides, and electron-deficient phenols.

PubMed

Mercalli, Valentina; Giustiniano, Mariateresa; Del Grosso, Erika; Varese, Monica; Cassese, Hilde; Massarotti, Alberto; Novellino, Ettore; Tron, Gian Cesare

2014-11-10

A library of 41 aryloxyimino amides was prepared via solution phase parallel synthesis by extending the multicomponent reaction of (Z)-chlorooximes and isocyanides to the use of electron-deficient phenols. The resulting aryloxyiminoamide derivatives can be used as intermediates for the synthesis of benzo[d]isoxazole-3-carboxamides, dramatically reducing the number of synthetic steps required by other methods reported in literature.
Data processing pipeline for serial femtosecond crystallography at SACLA.

PubMed

Nakane, Takanori; Joti, Yasumasa; Tono, Kensuke; Yabashi, Makina; Nango, Eriko; Iwata, So; Ishitani, Ryuichiro; Nureki, Osamu

2016-06-01

A data processing pipeline for serial femtosecond crystallography at SACLA was developed, based on Cheetah [Barty et al. (2014). J. Appl. Cryst. 47 , 1118-1131] and CrystFEL [White et al. (2016). J. Appl. Cryst. 49 , 680-689]. The original programs were adapted for data acquisition through the SACLA API, thread and inter-node parallelization, and efficient image handling. The pipeline consists of two stages: The first, online stage can analyse all images in real time, with a latency of less than a few seconds, to provide feedback on hit rate and detector saturation. The second, offline stage converts hit images into HDF5 files and runs CrystFEL for indexing and integration. The size of the filtered compressed output is comparable to that of a synchrotron data set. The pipeline enables real-time feedback and rapid structure solution during beamtime.
Development of 3D electromagnetic modeling tools for airborne vehicles

NASA Technical Reports Server (NTRS)

Volakis, John L.

1992-01-01

The main goal of this report is to advance the development of methodologies for scattering by airborne composite vehicles. Although the primary focus continues to be the development of a general purpose computer code for analyzing the entire structure as a single unit, a number of other tasks are also being pursued in parallel with this effort. One of these tasks discussed within is on new finite element formulations and mesh termination schemes. The goal here is to decrease computation time while retaining accuracy and geometric adaptability.The second task focuses on the application of wavelets to electromagnetics. Wavelet transformations are shown to be able to reduce a full matrix to a band matrix, thereby reducing the solutions memory requirements. Included within this document are two separate papers on finite element formulations and wavelets.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak

2001-01-01

The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data.

PubMed

Gomez-Pulido, Juan A; Cerrada-Barrios, Jose L; Trinidad-Amado, Sebastian; Lanza-Gutierrez, Jose M; Fernandez-Diaz, Ramon A; Crawford, Broderick; Soto, Ricardo

2016-08-31

Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.

Knowledge representation into Ada parallel processing

NASA Technical Reports Server (NTRS)

Masotto, Tom; Babikyan, Carol; Harper, Richard

1990-01-01

The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.
Attention and apparent motion.

PubMed

Horowitz, T; Treisman, A

1994-01-01

Two dissociations between short- and long-range motion in visual search are reported. Previous research has shown parallel processing for short-range motion and apparently serial processing for long-range motion. This finding has been replicated and it has also been found that search for short-range targets can be impaired both by using bicontrast stimuli, and by prior adaptation to the target direction of motion. Neither factor impaired search in long-range motion displays. Adaptation actually facilitated search with long-range displays, which is attributed to response-level effects. A feature-integration account of apparent motion is proposed. In this theory, short-range motion depends on specialized motion feature detectors operating in parallel across the display, but subject to selective adaptation, whereas attention is needed to link successive elements when they appear at greater separations, or across opposite contrasts.
Robust adaptive antiswing control of underactuated crane systems with two parallel payloads and rail length constraint.

PubMed

Zhang, Zhongcai; Wu, Yuqiang; Huang, Jinming

2016-11-01

The antiswing control and accurate positioning are simultaneously investigated for underactuated crane systems in the presence of two parallel payloads on the trolley and rail length limitation. The equations of motion for the crane system in question are established via the Euler-Lagrange equation. An adaptive control strategy is proposed with the help of system energy function and energy shaping technique. Stability analysis shows that under the designed adaptive controller, the payload swings can be suppressed ultimately and the trolley can be regulated to the destination while not exceeding the pre-specified boundaries. Simulation results are provided to show the satisfactory control performances of the presented control method in terms of working efficiency as well as robustness with respect to external disturbances. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Performance Analysis and Portability of the PLUM Load Balancing System

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1998-01-01

The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamblin, T; de Supinski, B R; Schulz, M

Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transformmore » and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.« less
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Method for six-legged robot stepping on obstacles by indirect force estimation

NASA Astrophysics Data System (ADS)

Xu, Yilin; Gao, Feng; Pan, Yang; Chai, Xun

2016-07-01

Adaptive gaits for legged robots often requires force sensors installed on foot-tips, however impact, temperature or humidity can affect or even damage those sensors. Efforts have been made to realize indirect force estimation on the legged robots using leg structures based on planar mechanisms. Robot Octopus III is a six-legged robot using spatial parallel mechanism(UP-2UPS) legs. This paper proposed a novel method to realize indirect force estimation on walking robot based on a spatial parallel mechanism. The direct kinematics model and the inverse kinematics model are established. The force Jacobian matrix is derived based on the kinematics model. Thus, the indirect force estimation model is established. Then, the relation between the output torques of the three motors installed on one leg to the external force exerted on the foot tip is described. Furthermore, an adaptive tripod static gait is designed. The robot alters its leg trajectory to step on obstacles by using the proposed adaptive gait. Both the indirect force estimation model and the adaptive gait are implemented and optimized in a real time control system. An experiment is carried out to validate the indirect force estimation model. The adaptive gait is tested in another experiment. Experiment results show that the robot can successfully step on a 0.2 m-high obstacle. This paper proposes a novel method to overcome obstacles for the six-legged robot using spatial parallel mechanism legs and to avoid installing the electric force sensors in harsh environment of the robot's foot tips.
Optimizing trial design in pharmacogenetics research: comparing a fixed parallel group, group sequential, and adaptive selection design on sample size requirements.

PubMed

Boessen, Ruud; van der Baan, Frederieke; Groenwold, Rolf; Egberts, Antoine; Klungel, Olaf; Grobbee, Diederick; Knol, Mirjam; Roes, Kit

2013-01-01

Two-stage clinical trial designs may be efficient in pharmacogenetics research when there is some but inconclusive evidence of effect modification by a genomic marker. Two-stage designs allow to stop early for efficacy or futility and can offer the additional opportunity to enrich the study population to a specific patient subgroup after an interim analysis. This study compared sample size requirements for fixed parallel group, group sequential, and adaptive selection designs with equal overall power and control of the family-wise type I error rate. The designs were evaluated across scenarios that defined the effect sizes in the marker positive and marker negative subgroups and the prevalence of marker positive patients in the overall study population. Effect sizes were chosen to reflect realistic planning scenarios, where at least some effect is present in the marker negative subgroup. In addition, scenarios were considered in which the assumed 'true' subgroup effects (i.e., the postulated effects) differed from those hypothesized at the planning stage. As expected, both two-stage designs generally required fewer patients than a fixed parallel group design, and the advantage increased as the difference between subgroups increased. The adaptive selection design added little further reduction in sample size, as compared with the group sequential design, when the postulated effect sizes were equal to those hypothesized at the planning stage. However, when the postulated effects deviated strongly in favor of enrichment, the comparative advantage of the adaptive selection design increased, which precisely reflects the adaptive nature of the design. Copyright © 2013 John Wiley & Sons, Ltd.
Finite element analysis and genetic algorithm optimization design for the actuator placement on a large adaptive structure

NASA Astrophysics Data System (ADS)

Sheng, Lizeng

The dissertation focuses on one of the major research needs in the area of adaptive/intelligent/smart structures, the development and application of finite element analysis and genetic algorithms for optimal design of large-scale adaptive structures. We first review some basic concepts in finite element method and genetic algorithms, along with the research on smart structures. Then we propose a solution methodology for solving a critical problem in the design of a next generation of large-scale adaptive structures---optimal placements of a large number of actuators to control thermal deformations. After briefly reviewing the three most frequently used general approaches to derive a finite element formulation, the dissertation presents techniques associated with general shell finite element analysis using flat triangular laminated composite elements. The element used here has three nodes and eighteen degrees of freedom and is obtained by combining a triangular membrane element and a triangular plate bending element. The element includes the coupling effect between membrane deformation and bending deformation. The membrane element is derived from the linear strain triangular element using Cook's transformation. The discrete Kirchhoff triangular (DKT) element is used as the plate bending element. For completeness, a complete derivation of the DKT is presented. Geometrically nonlinear finite element formulation is derived for the analysis of adaptive structures under the combined thermal and electrical loads. Next, we solve the optimization problems of placing a large number of piezoelectric actuators to control thermal distortions in a large mirror in the presence of four different thermal loads. We then extend this to a multi-objective optimization problem of determining only one set of piezoelectric actuator locations that can be used to control the deformation in the same mirror under the action of any one of the four thermal loads. A series of genetic algorithms, GA Version 1, 2 and 3, were developed to find the optimal locations of piezoelectric actuators from the order of 1021 ˜ 1056 candidate placements. Introducing a variable population approach, we improve the flexibility of selection operation in genetic algorithms. Incorporating mutation and hill climbing into micro-genetic algorithms, we are able to develop a more efficient genetic algorithm. Through extensive numerical experiments, we find that the design search space for the optimal placements of a large number of actuators is highly multi-modal and that the most distinct nature of genetic algorithms is their robustness. They give results that are random but with only a slight variability. The genetic algorithms can be used to get adequate solution using a limited number of evaluations. To get the highest quality solution, multiple runs including different random seed generators are necessary. The investigation time can be significantly reduced using a very coarse grain parallel computing. Overall, the methodology of using finite element analysis and genetic algorithm optimization provides a robust solution approach for the challenging problem of optimal placements of a large number of actuators in the design of next generation of adaptive structures.
Unstructured Adaptive Meshes: Bad for Your Memory?

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Parallel processors and nonlinear structural dynamics algorithms and software

NASA Technical Reports Server (NTRS)

Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.

1989-01-01

The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.
Host shifts result in parallel genetic changes when viruses evolve in closely related species

PubMed Central

Day, Jonathan P.; Smith, Sophia C. L.; Houslay, Thomas M.; Tagliaferri, Lucia

2018-01-01

Host shifts, where a pathogen invades and establishes in a new host species, are a major source of emerging infectious diseases. They frequently occur between related host species and often rely on the pathogen evolving adaptations that increase their fitness in the novel host species. To investigate genetic changes in novel hosts, we experimentally evolved replicate lineages of an RNA virus (Drosophila C Virus) in 19 different species of Drosophilidae and deep sequenced the viral genomes. We found a strong pattern of parallel evolution, where viral lineages from the same host were genetically more similar to each other than to lineages from other host species. When we compared viruses that had evolved in different host species, we found that parallel genetic changes were more likely to occur if the two host species were closely related. This suggests that when a virus adapts to one host it might also become better adapted to closely related host species. This may explain in part why host shifts tend to occur between related species, and may mean that when a new pathogen appears in a given species, closely related species may become vulnerable to the new disease. PMID:29649296
Global analysis of genes involved in freshwater adaptation in threespine sticklebacks (Gasterosteus aculeatus).

PubMed

DeFaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Goto, Akira; Merilä, Juha

2011-06-01

Examples of parallel evolution of phenotypic traits have been repeatedly demonstrated in threespine sticklebacks (Gasterosteus aculeatus) across their global distribution. Using these as a model, we performed a targeted genome scan--focusing on physiologically important genes potentially related to freshwater adaptation--to identify genetic signatures of parallel physiological evolution on a global scale. To this end, 50 microsatellite loci, including 26 loci within or close to (<6 kb) physiologically important genes, were screened in paired marine and freshwater populations from six locations across the Northern Hemisphere. Signatures of directional selection were detected in 24 loci, including 17 physiologically important genes, in at least one location. Although no loci showed consistent signatures of selection in all divergent population pairs, several outliers were common in multiple locations. In particular, seven physiologically important genes, as well as reference ectodysplasin gene (EDA), showed signatures of selection in three or more locations. Hence, although these results give some evidence for consistent parallel molecular evolution in response to freshwater colonization, they suggest that different evolutionary pathways may underlie physiological adaptation to freshwater habitats within the global distribution of the threespine stickleback. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Evaluation of truncation error and adaptive grid generation for the transonic full potential flow calculations

NASA Technical Reports Server (NTRS)

Nakamura, S.

1983-01-01

The effects of truncation error on the numerical solution of transonic flows using the full potential equation are studied. The effects of adapting grid point distributions to various solution aspects including shock waves is also discussed. A conclusion is that a rapid change of grid spacing is damaging to the accuracy of the flow solution. Therefore, in a solution adaptive grid application an optimal grid is obtained as a tradeoff between the amount of grid refinement and the rate of grid stretching.
High-Resolution Numerical Simulation and Analysis of Mach Reflection Structures in Detonation Waves in Low-Pressure H 2 –O 2 –Ar Mixtures: A Summary of Results Obtained with the Adaptive Mesh Refinement Framework AMROC

DOE PAGES

Deiterding, Ralf

2011-01-01

Numerical simulation can be key to the understanding of the multidimensional nature of transient detonation waves. However, the accurate approximation of realistic detonations is demanding as a wide range of scales needs to be resolved. This paper describes a successful solution strategy that utilizes logically rectangular dynamically adaptive meshes. The hydrodynamic transport scheme and the treatment of the nonequilibrium reaction terms are sketched. A ghost fluid approach is integrated into the method to allow for embedded geometrically complex boundaries. Large-scale parallel simulations of unstable detonation structures of Chapman-Jouguet detonations in low-pressure hydrogen-oxygen-argon mixtures demonstrate the efficiency of the described techniquesmore » in practice. In particular, computations of regular cellular structures in two and three space dimensions and their development under transient conditions, that is, under diffraction and for propagation through bends are presented. Some of the observed patterns are classified by shock polar analysis, and a diagram of the transition boundaries between possible Mach reflection structures is constructed.« less
Shedding New Light on Exploding Stars: Terascale Simulations of Nuetrino-Dreiven Supernovas and Their Nucleosynthesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dennis C. Smolarski, S.J.

Project Abstract This project was a continuation of work begun under a subcontract issued off of TSI-DOE Grant 1528746, awarded to the University of Illinois Urbana-Champaign. Dr. Anthony Mezzacappa is the Principal Investigator on the Illinois award. A separate award was issued to Santa Clara University to continue the collaboration during the time period May 2003 ? 2004. Smolarski continued to work on preconditioner technology and its interface with various iterative methods. He worked primarily with F. Dough Swesty (SUNY-Stony Brook) in continuing software development started in the 2002-03 academic year. Special attention was paid to the development and testingmore » of difference sparse approximate inverse preconditioners and their use in the solution of linear systems arising from radiation transport equations. The target was a high performance platform on which efficient implementation is a critical component of the overall effort. Smolarski also focused on the integration of the adaptive iterative algorithm, Chebycode, developed by Tom Manteuffel and Steve Ashby and adapted by Ryan Szypowski for parallel platforms, into the radiation transport code being developed at SUNY-Stony Brook.« less
Exploring equivalence domain in nonlinear inverse problems using Covariance Matrix Adaption Evolution Strategy (CMAES) and random sampling

NASA Astrophysics Data System (ADS)

Grayver, Alexander V.; Kuvshinov, Alexey V.

2016-05-01

This paper presents a methodology to sample equivalence domain (ED) in nonlinear partial differential equation (PDE)-constrained inverse problems. For this purpose, we first applied state-of-the-art stochastic optimization algorithm called Covariance Matrix Adaptation Evolution Strategy (CMAES) to identify low-misfit regions of the model space. These regions were then randomly sampled to create an ensemble of equivalent models and quantify uncertainty. CMAES is aimed at exploring model space globally and is robust on very ill-conditioned problems. We show that the number of iterations required to converge grows at a moderate rate with respect to number of unknowns and the algorithm is embarrassingly parallel. We formulated the problem by using the generalized Gaussian distribution. This enabled us to seamlessly use arbitrary norms for residual and regularization terms. We show that various regularization norms facilitate studying different classes of equivalent solutions. We further show how performance of the standard Metropolis-Hastings Markov chain Monte Carlo algorithm can be substantially improved by using information CMAES provides. This methodology was tested by using individual and joint inversions of magneotelluric, controlled-source electromagnetic (EM) and global EM induction data.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

PubMed

Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

2013-09-15

A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.

Time Parallel Solution of Linear Partial Differential Equations on the Intel Touchstone Delta Supercomputer

NASA Technical Reports Server (NTRS)

Toomarian, N.; Fijany, A.; Barhen, J.

1993-01-01

Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

NASA Astrophysics Data System (ADS)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Single-agent parallel window search

NASA Technical Reports Server (NTRS)

Powley, Curt; Korf, Richard E.

1991-01-01

Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.
Field-Programmable Gate Array Computer in Structural Analysis: An Initial Exploration

NASA Technical Reports Server (NTRS)

Singleterry, Robert C., Jr.; Sobieszczanski-Sobieski, Jaroslaw; Brown, Samuel

2002-01-01

This paper reports on an initial assessment of using a Field-Programmable Gate Array (FPGA) computational device as a new tool for solving structural mechanics problems. A FPGA is an assemblage of binary gates arranged in logical blocks that are interconnected via software in a manner dependent on the algorithm being implemented and can be reprogrammed thousands of times per second. In effect, this creates a computer specialized for the problem that automatically exploits all the potential for parallel computing intrinsic in an algorithm. This inherent parallelism is the most important feature of the FPGA computational environment. It is therefore important that if a problem offers a choice of different solution algorithms, an algorithm of a higher degree of inherent parallelism should be selected. It is found that in structural analysis, an 'analog computer' style of programming, which solves problems by direct simulation of the terms in the governing differential equations, yields a more favorable solution algorithm than current solution methods. This style of programming is facilitated by a 'drag-and-drop' graphic programming language that is supplied with the particular type of FPGA computer reported in this paper. Simple examples in structural dynamics and statics illustrate the solution approach used. The FPGA system also allows linear scalability in computing capability. As the problem grows, the number of FPGA chips can be increased with no loss of computing efficiency due to data flow or algorithmic latency that occurs when a single problem is distributed among many conventional processors that operate in parallel. This initial assessment finds the FPGA hardware and software to be in their infancy in regard to the user conveniences; however, they have enormous potential for shrinking the elapsed time of structural analysis solutions if programmed with algorithms that exhibit inherent parallelism and linear scalability. This potential warrants further development of FPGA-tailored algorithms for structural analysis.
Numerical simulation of h-adaptive immersed boundary method for freely falling disks

NASA Astrophysics Data System (ADS)

Zhang, Pan; Xia, Zhenhua; Cai, Qingdong

2018-05-01

In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.
Self-Avoiding Walks Over Adaptive Triangular Grids

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1999-01-01

Space-filling curves is a popular approach based on a geometric embedding for linearizing computational meshes. We present a new O(n log n) combinatorial algorithm for constructing a self avoiding walk through a two dimensional mesh containing n triangles. We show that for hierarchical adaptive meshes, the algorithm can be locally adapted and easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the runtime partitioning and load balancing of adaptive unstructured grids.
A new variable parallel holes collimator for scintigraphic device with validation method based on Monte Carlo simulations

NASA Astrophysics Data System (ADS)

Trinci, G.; Massari, R.; Scandellari, M.; Boccalini, S.; Costantini, S.; Di Sero, R.; Basso, A.; Sala, R.; Scopinaro, F.; Soluri, A.

2010-09-01

The aim of this work is to show a new scintigraphic device able to change automatically the length of its collimator in order to adapt the spatial resolution value to gamma source distance. This patented technique replaces the need for collimator change that standard gamma cameras still feature. Monte Carlo simulations represent the best tool in searching new technological solutions for such an innovative collimation structure. They also provide a valid analysis on response of gamma cameras performances as well as on advantages and limits of this new solution. Specifically, Monte Carlo simulations are realized with GEANT4 (GEometry ANd Tracking) framework and the specific simulation object is a collimation method based on separate blocks that can be brought closer and farther, in order to reach and maintain specific spatial resolution values for all source-detector distances. To verify the accuracy and the faithfulness of these simulations, we have realized experimental measurements with identical setup and conditions. This confirms the power of the simulation as an extremely useful tool, especially where new technological solutions need to be studied, tested and analyzed before their practical realization. The final aim of this new collimation system is the improvement of the SPECT techniques, with the real control of the spatial resolution value during tomographic acquisitions. This principle did allow us to simulate a tomographic acquisition of two capillaries of radioactive solution, in order to verify the possibility to clearly distinguish them.
Parallel approach in RDF query processing

NASA Astrophysics Data System (ADS)

Vajgl, Marek; Parenica, Jan

2017-07-01

Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
Parallel Algorithm Solves Coupled Differential Equations

NASA Technical Reports Server (NTRS)

Hayashi, A.

1987-01-01

Numerical methods adapted to concurrent processing. Algorithm solves set of coupled partial differential equations by numerical integration. Adapted to run on hypercube computer, algorithm separates problem into smaller problems solved concurrently. Increase in computing speed with concurrent processing over that achievable with conventional sequential processing appreciable, especially for large problems.
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

NASA Technical Reports Server (NTRS)

Rios, Joseph Lucio; Ross, Kevin

2009-01-01

Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data—Part 1: Algorithm

NASA Astrophysics Data System (ADS)

Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves

2009-03-01

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
A parallel algorithm for nonlinear convection-diffusion equations

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.

1990-01-01

A parallel algorithm for the efficient solution of nonlinear time-dependent convection-diffusion equations with small parameter on the diffusion term is presented. The method is based on a physically motivated domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. The method is suitable for the solution of problems arising in the simulation of fluid dynamics. Experimental results for a nonlinear equation in two-dimensions are presented.
Linearized potential solution for an airfoil in nonuniform parallel streams

NASA Technical Reports Server (NTRS)

Prabhu, R. K.; Tiwari, S. N.

1983-01-01

A small perturbation potential flow theory is applied to the problem of determining the chordwise pressure distribution, lift and pitching moment of a thin airfoil in the middle of five parallel streams. This theory is then extended to the case of an undisturbed stream having a given smooth velocity profile. Two typical examples are considered and the results obtained are compared with available solutions of Euler's equations. The agreement between these two results is not quite satisfactory. Possible reasons for the differences are indicated.
New Bandwidth Efficient Parallel Concatenated Coding Schemes

NASA Technical Reports Server (NTRS)

Denedetto, S.; Divsalar, D.; Montorsi, G.; Pollara, F.

1996-01-01

We propose a new solution to parallel concatenation of trellis codes with multilevel amplitude/phase modulations and a suitable iterative decoding structure. Examples are given for throughputs 2 bits/sec/Hz with 8PSK and 16QAM signal constellations.
Adapted RF pulse design for SAR reduction in parallel excitation with experimental verification at 9.4 T.

PubMed

Wu, Xiaoping; Akgün, Can; Vaughan, J Thomas; Andersen, Peter; Strupp, John; Uğurbil, Kâmil; Van de Moortele, Pierre-François

2010-07-01

Parallel excitation holds strong promises to mitigate the impact of large transmit B1 (B+1) distortion at very high magnetic field. Accelerated RF pulses, however, inherently tend to require larger values in RF peak power which may result in substantial increase in Specific Absorption Rate (SAR) in tissues, which is a constant concern for patient safety at very high field. In this study, we demonstrate adapted rate RF pulse design allowing for SAR reduction while preserving excitation target accuracy. Compared with other proposed implementations of adapted rate RF pulses, our approach is compatible with any k-space trajectories, does not require an analytical expression of the gradient waveform and can be used for large flip angle excitation. We demonstrate our method with numerical simulations based on electromagnetic modeling and we include an experimental verification of transmit pattern accuracy on an 8 transmit channel 9.4 T system.
Parallel Processing of Adaptive Meshes with Load Balancing

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
On-line range images registration with GPGPU

NASA Astrophysics Data System (ADS)

Będkowski, J.; Naruniec, J.

2013-03-01

This paper concerns implementation of algorithms in the two important aspects of modern 3D data processing: data registration and segmentation. Solution proposed for the first topic is based on the 3D space decomposition, while the latter on image processing and local neighbourhood search. Data processing is implemented by using NVIDIA compute unified device architecture (NIVIDIA CUDA) parallel computation. The result of the segmentation is a coloured map where different colours correspond to different objects, such as walls, floor and stairs. The research is related to the problem of collecting 3D data with a RGB-D camera mounted on a rotated head, to be used in mobile robot applications. Performance of the data registration algorithm is aimed for on-line processing. The iterative closest point (ICP) approach is chosen as a registration method. Computations are based on the parallel fast nearest neighbour search. This procedure decomposes 3D space into cubic buckets and, therefore, the time of the matching is deterministic. First technique of the data segmentation uses accele-rometers integrated with a RGB-D sensor to obtain rotation compensation and image processing method for defining pre-requisites of the known categories. The second technique uses the adapted nearest neighbour search procedure for obtaining normal vectors for each range point.
CERN data services for LHC computing

NASA Astrophysics Data System (ADS)

Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.

2017-10-01

Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.

A taxonomy and comparison of parallel block multi-level preconditioners for the incompressible Navier-Stokes equations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shadid, John Nicolas; Elman, Howard; Shuttleworth, Robert R.

2007-04-01

In recent years, considerable effort has been placed on developing efficient and robust solution algorithms for the incompressible Navier-Stokes equations based on preconditioned Krylov methods. These include physics-based methods, such as SIMPLE, and purely algebraic preconditioners based on the approximation of the Schur complement. All these techniques can be represented as approximate block factorization (ABF) type preconditioners. The goal is to decompose the application of the preconditioner into simplified sub-systems in which scalable multi-level type solvers can be applied. In this paper we develop a taxonomy of these ideas based on an adaptation of a generalized approximate factorization of themore » Navier-Stokes system first presented in [25]. This taxonomy illuminates the similarities and differences among these preconditioners and the central role played by efficient approximation of certain Schur complement operators. We then present a parallel computational study that examines the performance of these methods and compares them to an additive Schwarz domain decomposition (DD) algorithm. Results are presented for two and three-dimensional steady state problems for enclosed domains and inflow/outflow systems on both structured and unstructured meshes. The numerical experiments are performed using MPSalsa, a stabilized finite element code.« less
Basis adaptation and domain decomposition for steady partial differential equations with random coefficients

DOE PAGES

Tipireddy, R.; Stinis, P.; Tartakovsky, A. M.

2017-09-04

In this paper, we present a novel approach for solving steady-state stochastic partial differential equations (PDEs) with high-dimensional random parameter space. The proposed approach combines spatial domain decomposition with basis adaptation for each subdomain. The basis adaptation is used to address the curse of dimensionality by constructing an accurate low-dimensional representation of the stochastic PDE solution (probability density function and/or its leading statistical moments) in each subdomain. Restricting the basis adaptation to a specific subdomain affords finding a locally accurate solution. Then, the solutions from all of the subdomains are stitched together to provide a global solution. We support ourmore » construction with numerical experiments for a steady-state diffusion equation with a random spatially dependent coefficient. Lastly, our results show that highly accurate global solutions can be obtained with significantly reduced computational costs.« less
Basis adaptation and domain decomposition for steady-state partial differential equations with random coefficients

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tipireddy, R.; Stinis, P.; Tartakovsky, A. M.

We present a novel approach for solving steady-state stochastic partial differential equations (PDEs) with high-dimensional random parameter space. The proposed approach combines spatial domain decomposition with basis adaptation for each subdomain. The basis adaptation is used to address the curse of dimensionality by constructing an accurate low-dimensional representation of the stochastic PDE solution (probability density function and/or its leading statistical moments) in each subdomain. Restricting the basis adaptation to a specific subdomain affords finding a locally accurate solution. Then, the solutions from all of the subdomains are stitched together to provide a global solution. We support our construction with numericalmore » experiments for a steady-state diffusion equation with a random spatially dependent coefficient. Our results show that highly accurate global solutions can be obtained with significantly reduced computational costs.« less
An hp-adaptivity and error estimation for hyperbolic conservation laws

NASA Technical Reports Server (NTRS)

Bey, Kim S.

1995-01-01

This paper presents an hp-adaptive discontinuous Galerkin method for linear hyperbolic conservation laws. A priori and a posteriori error estimates are derived in mesh-dependent norms which reflect the dependence of the approximate solution on the element size (h) and the degree (p) of the local polynomial approximation. The a posteriori error estimate, based on the element residual method, provides bounds on the actual global error in the approximate solution. The adaptive strategy is designed to deliver an approximate solution with the specified level of error in three steps. The a posteriori estimate is used to assess the accuracy of a given approximate solution and the a priori estimate is used to predict the mesh refinements and polynomial enrichment needed to deliver the desired solution. Numerical examples demonstrate the reliability of the a posteriori error estimates and the effectiveness of the hp-adaptive strategy.
A framework for grand scale parallelization of the combined finite discrete element method in 2d

NASA Astrophysics Data System (ADS)

Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.

2014-09-01

Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

NASA Technical Reports Server (NTRS)

Hsia, T. C.; Lu, G. Z.; Han, W. H.

1987-01-01

In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
The use of solution adaptive grids in solving partial differential equations

NASA Technical Reports Server (NTRS)

Anderson, D. A.; Rai, M. M.

1982-01-01

The grid point distribution used in solving a partial differential equation using a numerical method has a substantial influence on the quality of the solution. An adaptive grid which adjusts as the solution changes provides the best results when the number of grid points available for use during the calculation is fixed. Basic concepts used in generating and applying adaptive grids are reviewed in this paper, and examples illustrating applications of these concepts are presented.
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
A domain specific language for performance portable molecular dynamics algorithms

NASA Astrophysics Data System (ADS)

Saunders, William Robert; Grant, James; Müller, Eike Hermann

2018-03-01

Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we describe a "Separation of Concerns" approach for the development of parallel and optimised MD codes: the science specialist writes code at a high abstraction level in a domain specific language (DSL), which is then translated into efficient computer code by a scientific programmer. In a related context, an abstraction for the solution of partial differential equations with grid based methods has recently been implemented in the (Py)OP2 library. Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and GPUs. We demonstrate the efficiency of the auto-generated code by studying its performance and scalability on different hardware and compare it to other state-of-the-art simulation packages. With growing data volumes the extraction of physically meaningful information from the simulation becomes increasingly challenging and requires equally efficient implementations. A particular advantage of our approach is the easy expression of such analysis algorithms. We consider two popular methods for deducing the crystalline structure of a material from the local environment of each atom, show how they can be expressed in our abstraction and implement them in the code generation framework.
A video event trigger for high frame rate, high resolution video technology

NASA Astrophysics Data System (ADS)

Williams, Glenn L.

1991-12-01

When video replaces film the digitized video data accumulates very rapidly, leading to a difficult and costly data storage problem. One solution exists for cases when the video images represent continuously repetitive 'static scenes' containing negligible activity, occasionally interrupted by short events of interest. Minutes or hours of redundant video frames can be ignored, and not stored, until activity begins. A new, highly parallel digital state machine generates a digital trigger signal at the onset of a video event. High capacity random access memory storage coupled with newly available fuzzy logic devices permits the monitoring of a video image stream for long term or short term changes caused by spatial translation, dilation, appearance, disappearance, or color change in a video object. Pretrigger and post-trigger storage techniques are then adaptable for archiving the digital stream from only the significant video images.
Fitting neuron models to spike trains.

PubMed

Rossant, Cyrille; Goodman, Dan F M; Fontaine, Bertrand; Platkiewicz, Jonathan; Magnusson, Anna K; Brette, Romain

2011-01-01

Computational modeling is increasingly used to understand the function of neural circuits in systems neuroscience. These studies require models of individual neurons with realistic input-output properties. Recently, it was found that spiking models can accurately predict the precisely timed spike trains produced by cortical neurons in response to somatically injected currents, if properly fitted. This requires fitting techniques that are efficient and flexible enough to easily test different candidate models. We present a generic solution, based on the Brian simulator (a neural network simulator in Python), which allows the user to define and fit arbitrary neuron models to electrophysiological recordings. It relies on vectorization and parallel computing techniques to achieve efficiency. We demonstrate its use on neural recordings in the barrel cortex and in the auditory brainstem, and confirm that simple adaptive spiking models can accurately predict the response of cortical neurons. Finally, we show how a complex multicompartmental model can be reduced to a simple effective spiking model.
History, applications, and challenges of immune repertoire research.

PubMed

Liu, Xiao; Wu, Jinghua

2018-02-27

The diversity of T and B cells in terms of their receptor sequences is huge in the vertebrate's immune system and provides broad protection against the vast diversity of pathogens. Immune repertoire is defined as the sum of T cell receptors and B cell receptors (also named immunoglobulin) that makes the organism's adaptive immune system. Before the emergence of high-throughput sequencing, the studies on immune repertoire were limited by the underdeveloped methodologies, since it was impossible to capture the whole picture by the low-throughput tools. The massive paralleled sequencing technology suits perfectly the researches on immune repertoire. In this article, we review the history of immune repertoire studies, in terms of technologies and research applications. Particularly, we discuss several aspects of challenges in this field and highlight the efforts to develop potential solutions, in the era of high-throughput sequencing of the immune repertoire.
Hemimetabolous genomes reveal molecular basis of termite eusociality.

PubMed

Harrison, Mark C; Jongepier, Evelien; Robertson, Hugh M; Arning, Nicolas; Bitard-Feildel, Tristan; Chao, Hsu; Childers, Christopher P; Dinh, Huyen; Doddapaneni, Harshavardhan; Dugan, Shannon; Gowin, Johannes; Greiner, Carolin; Han, Yi; Hu, Haofu; Hughes, Daniel S T; Huylmans, Ann-Kathrin; Kemena, Carsten; Kremer, Lukas P M; Lee, Sandra L; Lopez-Ezquerra, Alberto; Mallet, Ludovic; Monroy-Kuhn, Jose M; Moser, Annabell; Murali, Shwetha C; Muzny, Donna M; Otani, Saria; Piulachs, Maria-Dolors; Poelchau, Monica; Qu, Jiaxin; Schaub, Florentine; Wada-Katsumata, Ayako; Worley, Kim C; Xie, Qiaolin; Ylla, Guillem; Poulsen, Michael; Gibbs, Richard A; Schal, Coby; Richards, Stephen; Belles, Xavier; Korb, Judith; Bornberg-Bauer, Erich

2018-03-01

Around 150 million years ago, eusocial termites evolved from within the cockroaches, 50 million years before eusocial Hymenoptera, such as bees and ants, appeared. Here, we report the 2-Gb genome of the German cockroach, Blattella germanica, and the 1.3-Gb genome of the drywood termite Cryptotermes secundus. We show evolutionary signatures of termite eusociality by comparing the genomes and transcriptomes of three termites and the cockroach against the background of 16 other eusocial and non-eusocial insects. Dramatic adaptive changes in genes underlying the production and perception of pheromones confirm the importance of chemical communication in the termites. These are accompanied by major changes in gene regulation and the molecular evolution of caste determination. Many of these results parallel molecular mechanisms of eusocial evolution in Hymenoptera. However, the specific solutions are remarkably different, thus revealing a striking case of convergence in one of the major evolutionary transitions in biological complexity.
A video event trigger for high frame rate, high resolution video technology

NASA Technical Reports Server (NTRS)

Williams, Glenn L.

1991-01-01

When video replaces film the digitized video data accumulates very rapidly, leading to a difficult and costly data storage problem. One solution exists for cases when the video images represent continuously repetitive 'static scenes' containing negligible activity, occasionally interrupted by short events of interest. Minutes or hours of redundant video frames can be ignored, and not stored, until activity begins. A new, highly parallel digital state machine generates a digital trigger signal at the onset of a video event. High capacity random access memory storage coupled with newly available fuzzy logic devices permits the monitoring of a video image stream for long term or short term changes caused by spatial translation, dilation, appearance, disappearance, or color change in a video object. Pretrigger and post-trigger storage techniques are then adaptable for archiving the digital stream from only the significant video images.
An iterative method for systems of nonlinear hyperbolic equations

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.

1989-01-01

An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic equations is presented. Parallelism is evident at several levels. In the formation of the iteration, the equations are decoupled, thereby providing large grain parallelism. Parallelism may also be exploited within the solves for each equation. Convergence of the interation is established via a bounding function argument. Experimental results in two-dimensions are presented.
Experimental and numerical research on the aerodynamics of unsteady moving aircraft

NASA Astrophysics Data System (ADS)

Bergmann, Andreas; Huebner, Andreas; Loeser, Thomas

2008-02-01

For the experimental determination of the dynamic wind tunnel data, a new combined motion test capability was developed at the German-Dutch Wind Tunnels DNW for their 3 m Low Speed Wind Tunnel NWB in Braunschweig, Germany, using a unique six degree-of-freedom test rig called ‘Model Positioning Mechanism’ (MPM) as an improved successor to the older systems. With that cutting-edge device, several transport aircraft configurations including a blended wing body configuration were tested in different modes of oscillatory motions roll, pitch and yaw as well as delta-wing geometries like X-31 equipped with remote controlled rudders and flaps to be able to simulate realistic flight maneuvers, e.g., a Dutch Roll. This paper describes the motivation behind these tests and the test setup and in addition gives a short introduction into time accurate maneuver-testing capabilities incorporating models with remote controlled control surfaces. Furthermore, the adaptation of numerical methods for the prediction of dynamic derivatives is described and some examples with the DLR-F12 configuration will be given. The calculations are based on RANS-solution using the finite volume parallel solution algorithm with an unstructured discretization concept (DLR TAU-code).
High performance genetic algorithm for VLSI circuit partitioning

NASA Astrophysics Data System (ADS)

Dinu, Simona

2016-12-01

Partitioning is one of the biggest challenges in computer-aided design for VLSI circuits (very large-scale integrated circuits). This work address the min-cut balanced circuit partitioning problem- dividing the graph that models the circuit into almost equal sized k sub-graphs while minimizing the number of edges cut i.e. minimizing the number of edges connecting the sub-graphs. The problem may be formulated as a combinatorial optimization problem. Experimental studies in the literature have shown the problem to be NP-hard and thus it is important to design an efficient heuristic algorithm to solve it. The approach proposed in this study is a parallel implementation of a genetic algorithm, namely an island model. The information exchange between the evolving subpopulations is modeled using a fuzzy controller, which determines an optimal balance between exploration and exploitation of the solution space. The results of simulations show that the proposed algorithm outperforms the standard sequential genetic algorithm both in terms of solution quality and convergence speed. As a direction for future study, this research can be further extended to incorporate local search operators which should include problem-specific knowledge. In addition, the adaptive configuration of mutation and crossover rates is another guidance for future research.
Exact coherent structures in an asymptotically reduced description of parallel shear flows

NASA Astrophysics Data System (ADS)

Beaume, Cédric; Knobloch, Edgar; Chini, Gregory P.; Julien, Keith

2015-02-01

A reduced description of shear flows motivated by the Reynolds number scaling of lower-branch exact coherent states in plane Couette flow (Wang J, Gibson J and Waleffe F 2007 Phys. Rev. Lett. 98 204501) is constructed. Exact time-independent nonlinear solutions of the reduced equations corresponding to both lower and upper branch states are found for a sinusoidal, body-forced shear flow. The lower branch solution is characterized by fluctuations that vary slowly along the critical layer while the upper branch solutions display a bimodal structure and are more strongly focused on the critical layer. The reduced equations provide a rational framework for investigations of subcritical spatiotemporal patterns in parallel shear flows.
The development of a revised version of multi-center molecular Ornstein-Zernike equation

NASA Astrophysics Data System (ADS)

Kido, Kentaro; Yokogawa, Daisuke; Sato, Hirofumi

2012-04-01

Ornstein-Zernike (OZ)-type theory is a powerful tool to obtain 3-dimensional solvent distribution around solute molecule. Recently, we proposed multi-center molecular OZ method, which is suitable for parallel computing of 3D solvation structure. The distribution function in this method consists of two components, namely reference and residue parts. Several types of the function were examined as the reference part to investigate the numerical robustness of the method. As the benchmark, the method is applied to water, benzene in aqueous solution and single-walled carbon nanotube in chloroform solution. The results indicate that fully-parallelization is achieved by utilizing the newly proposed reference functions.
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Doong, Shing H.

2009-01-01

The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koniges, A.E.

The author describes the new T3D parallel computer at NERSC. The adaptive mesh ICF3D code is one of the current applications being ported and developed for use on the T3D. It has been stressed in other papers in this proceedings that the development environment and tools available on the parallel computer is similar to any planned for the future including networks of workstations.
The island dynamics model on parallel quadtree grids

NASA Astrophysics Data System (ADS)

Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

2018-05-01

We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.
Parallel design of JPEG-LS encoder on graphics processing units

NASA Astrophysics Data System (ADS)

Duan, Hao; Fang, Yong; Huang, Bormin

2012-01-01

With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
Hypercube matrix computation task

NASA Technical Reports Server (NTRS)

Calalo, R.; Imbriale, W.; Liewer, P.; Lyons, J.; Manshadi, F.; Patterson, J.

1987-01-01

The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture.
Symmetric and asymmetric capillary bridges between a rough surface and a parallel surface.

PubMed

Wang, Yongxin; Michielsen, Stephen; Lee, Hoon Joo

2013-09-03

Although the formation of a capillary bridge between two parallel surfaces has been extensively studied, the majority of research has described only symmetric capillary bridges between two smooth surfaces. In this work, an instrument was built to form a capillary bridge by squeezing a liquid drop on one surface with another surface. An analytical solution that describes the shape of symmetric capillary bridges joining two smooth surfaces has been extended to bridges that are asymmetric about the midplane and to rough surfaces. The solution, given by elliptical integrals of the first and second kind, is consistent with a constant Laplace pressure over the entire surface and has been verified for water, Kaydol, and dodecane drops forming symmetric and asymmetric bridges between parallel smooth surfaces. This solution has been applied to asymmetric capillary bridges between a smooth surface and a rough fabric surface as well as symmetric bridges between two rough surfaces. These solutions have been experimentally verified, and good agreement has been found between predicted and experimental profiles for small drops where the effect of gravity is negligible. Finally, a protocol for determining the profile from the volume and height of the capillary bridge has been developed and experimentally verified.
Some Considerations in Maintaining Adaptive Test Item Pools.

ERIC Educational Resources Information Center

Stocking, Martha L.

The construction of parallel editions of conventional tests for purposes of test security while maintaining score comparability has always been a recognized and difficult problem in psychometrics and test construction. The introduction of new modes of test construction, e.g., adaptive testing, changes the nature of the problem, but does not make…
Paper-based enzymatic microfluidic fuel cell: From a two-stream flow device to a single-stream lateral flow strip

NASA Astrophysics Data System (ADS)

González-Guerrero, Maria José; del Campo, F. Javier; Esquivel, Juan Pablo; Giroud, Fabien; Minteer, Shelley D.; Sabaté, Neus

2016-09-01

This work presents a first approach towards the development of a cost-effective enzymatic paper-based glucose/O2 microfluidic fuel cell in which fluid transport is based on capillary action. A first fuel cell configuration consists of a Y-shaped paper device with the fuel and the oxidant flowing in parallel over carbon paper electrodes modified with bioelectrocatalytic enzymes. The anode consists of a ferrocenium-based polyethyleneimine polymer linked to glucose oxidase (GOx/Fc-C6-LPEI), while the cathode contains a mixture of laccase, anthracene-modified multiwall carbon nanotubes, and tetrabutylammonium bromide-modified Nafion (MWCNTs/laccase/TBAB-Nafion). Subsequently, the Y-shaped configuration is improved to use a single solution containing both, the anolyte and the catholyte. Thus, the electrolytes pHs of the fuel and the oxidant solutions are adapted to an intermediate pH of 5.5. Finally, the fuel cell is run with this single solution obtaining a maximum open circuit of 0.55 ± 0.04 V and a maximum current and power density of 225 ± 17 μA cm-2 and 24 ± 5 μW cm-2, respectively. Hence, a power source closer to a commercial application (similar to conventional lateral flow test strips) is developed and successfully operated. This system can be used to supply the energy required to power microelectronics demanding low power consumption.
[Assessing the benefits of digital health solutions in the societal reimbursement context].

PubMed

Albrecht, Urs-Vito; Kuhn, Bertolt; Land, Jörg; Amelung, Volker E; von Jan, Ute

2018-03-01

For a number of reasons, achieving reimbursability for digital health products has so far proven difficult. Demonstrating the benefits of the technology is the main hurdle in this context. The generally accepted evaluation processes, especially parallel group comparisons in randomized controlled trials (RCTs) for (clinical) benefit assessment, are primarily intended to deal with questions of (added) medical benefit. In contrast to drugs or classical medical devices, users of digital health solutions often profit from gaining autonomy, increased awareness and mindfulness, better transparency in the provision of care, and improved comfort, although there are also digital solutions with an interventional character targeting clinical outcomes (e. g. for indications such as anorexia, depression). Commonly accepted methods for evaluating (clinical) benefits primarily rely on medical outcomes, such as morbidity and mortality, but do not adequately consider additional benefits unique to digital health. The challenge is therefore to develop evaluation designs that respect the particularities of digital health without reducing the validity of the evaluations (especially with respect to safety). There is an increasing need for concepts that include both continuous feedback loops for adapting and improving an application while at the same time generate sufficient evidence for complex benefit assessments. This approach may help improve risk benefit ratio assessments of digital health when it comes to implementing digital innovations in healthcare.
Electrooptical adaptive switching network for the hypercube computer

NASA Technical Reports Server (NTRS)

Chow, E.; Peterson, J.

1988-01-01

An all-optical network design for the hyperswitch network using regular free-space interconnects between electronic processor nodes is presented. The adaptive routing model used is described, and an adaptive routing control example is presented. The design demonstrates that existing electrooptical techniques are sufficient for implementing efficient parallel architectures without the need for more complex means of implementing arbitrary interconnection schemes. The electrooptical hyperswitch network significantly improves the communication performance of the hypercube computer.
Advances in the U.S. Navy Non-hydrostatic Unified Model of the Atmosphere (NUMA): LES as a Stabilization Methodology for High-Order Spectral Elements in the Simulation of Deep Convection

NASA Astrophysics Data System (ADS)

Marras, Simone; Giraldo, Frank

2015-04-01

The prediction of extreme weather sufficiently ahead of its occurrence impacts society as a whole and coastal communities specifically (e.g. Hurricane Sandy that impacted the eastern seaboard of the U.S. in the fall of 2012). With the final goal of solving hurricanes at very high resolution and numerical accuracy, we have been developing the Non-hydrostatic Unified Model of the Atmosphere (NUMA) to solve the Euler and Navier-Stokes equations by arbitrary high-order element-based Galerkin methods on massively parallel computers. NUMA is a unified model with respect to the following criteria: (a) it is based on unified numerics in that element-based Galerkin methods allow the user to choose between continuous (spectral elements, CG) or discontinuous Galerkin (DG) methods and from a large spectrum of time integrators, (b) it is unified across scales in that it can solve flow in limited-area mode (flow in a box) or in global mode (flow on the sphere). NUMA is the dynamical core that powers the U.S. Naval Research Laboratory's next-generation global weather prediction system NEPTUNE (Navy's Environmental Prediction sysTem Utilizing the NUMA corE). Because the solution of the Euler equations by high order methods is prone to instabilities that must be damped in some way, we approach the problem of stabilization via an adaptive Large Eddy Simulation (LES) scheme meant to treat such instabilities by modeling the sub-grid scale features of the flow. The novelty of our effort lies in the extension to high order spectral elements for low Mach number stratified flows of a method that was originally designed for low order, adaptive finite elements in the high Mach number regime [1]. The Euler equations are regularized by means of a dynamically adaptive stress tensor that is proportional to the residual of the unperturbed equations. Its effect is close to none where the solution is sufficiently smooth, whereas it increases elsewhere, with a direct contribution to the stabilization of the otherwise oscillatory solution. As a first step toward the Large Eddy Simulation of a hurricane, we verify the model via a high-order and high resolution idealized simulation of deep convection on the sphere. References [1] M. Nazarov and J. Hoffman (2013) Residual-based artificial viscosity for simulation of turbulent compressible flow using adaptive finite element methods Int. J. Numer. Methods Fluids, 71:339-357
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields

NASA Astrophysics Data System (ADS)

Forsythe, Victoriya V.; Makarevich, Roman A.

2016-11-01

An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Partitioning and packing mathematical simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

Arpasi, D. J.; Milner, E. J.

1986-01-01

The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
Longitudinal trends in climate drive flowering time clines in North American Arabidopsis thaliana.

PubMed

Samis, Karen E; Murren, Courtney J; Bossdorf, Oliver; Donohue, Kathleen; Fenster, Charles B; Malmberg, Russell L; Purugganan, Michael D; Stinchcombe, John R

2012-06-01

Introduced species frequently show geographic differentiation, and when differentiation mirrors the ancestral range, it is often taken as evidence of adaptive evolution. The mouse-ear cress (Arabidopsis thaliana) was introduced to North America from Eurasia 150-200 years ago, providing an opportunity to study parallel adaptation in a genetic model organism. Here, we test for clinal variation in flowering time using 199 North American (NA) accessions of A. thaliana, and evaluate the contributions of major flowering time genes FRI, FLC, and PHYC as well as potential ecological mechanisms underlying differentiation. We find evidence for substantial within population genetic variation in quantitative traits and flowering time, and putatively adaptive longitudinal differentiation, despite low levels of variation at FRI, FLC, and PHYC and genome-wide reductions in population structure relative to Eurasian (EA) samples. The observed longitudinal cline in flowering time in North America is parallel to an EA cline, robust to the effects of population structure, and associated with geographic variation in winter precipitation and temperature. We detected major effects of FRI on quantitative traits associated with reproductive fitness, although the haplotype associated with higher fitness remains rare in North America. Collectively, our results suggest the evolution of parallel flowering time clines through novel genetic mechanisms.
Increased performance in the short-term water demand forecasting through the use of a parallel adaptive weighting strategy

NASA Astrophysics Data System (ADS)

Sardinha-Lourenço, A.; Andrade-Campos, A.; Antunes, A.; Oliveira, M. S.

2018-03-01

Recent research on water demand short-term forecasting has shown that models using univariate time series based on historical data are useful and can be combined with other prediction methods to reduce errors. The behavior of water demands in drinking water distribution networks focuses on their repetitive nature and, under meteorological conditions and similar consumers, allows the development of a heuristic forecast model that, in turn, combined with other autoregressive models, can provide reliable forecasts. In this study, a parallel adaptive weighting strategy of water consumption forecast for the next 24-48 h, using univariate time series of potable water consumption, is proposed. Two Portuguese potable water distribution networks are used as case studies where the only input data are the consumption of water and the national calendar. For the development of the strategy, the Autoregressive Integrated Moving Average (ARIMA) method and a short-term forecast heuristic algorithm are used. Simulations with the model showed that, when using a parallel adaptive weighting strategy, the prediction error can be reduced by 15.96% and the average error by 9.20%. This reduction is important in the control and management of water supply systems. The proposed methodology can be extended to other forecast methods, especially when it comes to the availability of multiple forecast models.
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
An assessment of the adaptive unstructured tetrahedral grid, Euler Flow Solver Code FELISA

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Erickson, Larry L.

1994-01-01

A three-dimensional solution-adaptive Euler flow solver for unstructured tetrahedral meshes is assessed, and the accuracy and efficiency of the method for predicting sonic boom pressure signatures about simple generic models are demonstrated. Comparison of computational and wind tunnel data and enhancement of numerical solutions by means of grid adaptivity are discussed. The mesh generation is based on the advancing front technique. The FELISA code consists of two solvers, the Taylor-Galerkin and the Runge-Kutta-Galerkin schemes, both of which are spacially discretized by the usual Galerkin weighted residual finite-element methods but with different explicit time-marching schemes to steady state. The solution-adaptive grid procedure is based on either remeshing or mesh refinement techniques. An alternative geometry adaptive procedure is also incorporated.
MPACT Standard Input User s Manual, Version 2.2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Benjamin S.; Downar, Thomas; Fitzgerald, Andrew

The MPACT (Michigan PArallel Charactistics based Transport) code is designed to perform high-fidelity light water reactor (LWR) analysis using whole-core pin-resolved neutron transport calculations on modern parallel-computing hardware. The code consists of several libraries which provide the functionality necessary to solve steady-state eigenvalue problems. Several transport capabilities are available within MPACT including both 2-D and 3-D Method of Characteristics (MOC). A three-dimensional whole core solution based on the 2D-1D solution method provides the capability for full core depletion calculations.
Parallel approach for bioinspired algorithms

NASA Astrophysics Data System (ADS)

Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina

2018-05-01

In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
EUPDF-II: An Eulerian Joint Scalar Monte Carlo PDF Module : User's Manual

NASA Technical Reports Server (NTRS)

Raju, M. S.; Liu, Nan-Suey (Technical Monitor)

2004-01-01

EUPDF-II provides the solution for the species and temperature fields based on an evolution equation for PDF (Probability Density Function) and it is developed mainly for application with sprays, combustion, parallel computing, and unstructured grids. It is designed to be massively parallel and could easily be coupled with any existing gas-phase CFD and spray solvers. The solver accommodates the use of an unstructured mesh with mixed elements of either triangular, quadrilateral, and/or tetrahedral type. The manual provides the user with an understanding of the various models involved in the PDF formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers. The source code of EUPDF-II will be available with National Combustion Code (NCC) as a complete package.

New Factorization Techniques and Parallel (log N) Algorithms for Forward Dynamics Solution of Single Closed-Chain Robot Manipulators

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper parallel 0(log N) algorithms for dynamic simulation of single closed-chain rigid multibody system as specialized to the case of a robot manipulatoar in contact with the environment are developed.
A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains

NASA Astrophysics Data System (ADS)

Lopes, I. A. Rodrigues; Pires, F. M. Andrade; Reis, F. J. P.

2018-02-01

A mixed parallel strategy for the solution of homogenization-based multi-scale constitutive problems undergoing finite strains is proposed. The approach aims to reduce the computational time and memory requirements of non-linear coupled simulations that use finite element discretization at both scales (FE^2). In the first level of the algorithm, a non-conforming domain decomposition technique, based on the FETI method combined with a mortar discretization at the interface of macroscopic subdomains, is employed. A master-slave scheme, which distributes tasks by macroscopic element and adopts dynamic scheduling, is then used for each macroscopic subdomain composing the second level of the algorithm. This strategy allows the parallelization of FE^2 simulations in computers with either shared memory or distributed memory architectures. The proposed strategy preserves the quadratic rates of asymptotic convergence that characterize the Newton-Raphson scheme. Several examples are presented to demonstrate the robustness and efficiency of the proposed parallel strategy.
The design and implementation of a parallel unstructured Euler solver using software primitives

NASA Technical Reports Server (NTRS)

Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.

1992-01-01

This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
Optimizing a realistic large-scale frequency assignment problem using a new parallel evolutionary approach

NASA Astrophysics Data System (ADS)

Chaves-González, José M.; Vega-Rodríguez, Miguel A.; Gómez-Pulido, Juan A.; Sánchez-Pérez, Juan M.

2011-08-01

This article analyses the use of a novel parallel evolutionary strategy to solve complex optimization problems. The work developed here has been focused on a relevant real-world problem from the telecommunication domain to verify the effectiveness of the approach. The problem, known as frequency assignment problem (FAP), basically consists of assigning a very small number of frequencies to a very large set of transceivers used in a cellular phone network. Real data FAP instances are very difficult to solve due to the NP-hard nature of the problem, therefore using an efficient parallel approach which makes the most of different evolutionary strategies can be considered as a good way to obtain high-quality solutions in short periods of time. Specifically, a parallel hyper-heuristic based on several meta-heuristics has been developed. After a complete experimental evaluation, results prove that the proposed approach obtains very high-quality solutions for the FAP and beats any other result published.
A parallel time integrator for noisy nonlinear oscillatory systems

NASA Astrophysics Data System (ADS)

Subber, Waad; Sarkar, Abhijit

2018-06-01

In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).
Solution-adaptive finite element method in computational fracture mechanics

NASA Technical Reports Server (NTRS)

Min, J. B.; Bass, J. M.; Spradley, L. W.

1993-01-01

Some recent results obtained using solution-adaptive finite element method in linear elastic two-dimensional fracture mechanics problems are presented. The focus is on the basic issue of adaptive finite element method for validating the applications of new methodology to fracture mechanics problems by computing demonstration problems and comparing the stress intensity factors to analytical results.
Hardware Implementation of Lossless Adaptive and Scalable Hyperspectral Data Compression for Space

NASA Technical Reports Server (NTRS)

Aranki, Nazeeh; Keymeulen, Didier; Bakhshi, Alireza; Klimesh, Matthew

2009-01-01

On-board lossless hyperspectral data compression reduces data volume in order to meet NASA and DoD limited downlink capabilities. The technique also improves signature extraction, object recognition and feature classification capabilities by providing exact reconstructed data on constrained downlink resources. At JPL a novel, adaptive and predictive technique for lossless compression of hyperspectral data was recently developed. This technique uses an adaptive filtering method and achieves a combination of low complexity and compression effectiveness that far exceeds state-of-the-art techniques currently in use. The JPL-developed 'Fast Lossless' algorithm requires no training data or other specific information about the nature of the spectral bands for a fixed instrument dynamic range. It is of low computational complexity and thus well-suited for implementation in hardware. A modified form of the algorithm that is better suited for data from pushbroom instruments is generally appropriate for flight implementation. A scalable field programmable gate array (FPGA) hardware implementation was developed. The FPGA implementation achieves a throughput performance of 58 Msamples/sec, which can be increased to over 100 Msamples/sec in a parallel implementation that uses twice the hardware resources This paper describes the hardware implementation of the 'Modified Fast Lossless' compression algorithm on an FPGA. The FPGA implementation targets the current state-of-the-art FPGAs (Xilinx Virtex IV and V families) and compresses one sample every clock cycle to provide a fast and practical real-time solution for space applications.
Simultaneous learning and filtering without delusions: a Bayes-optimal combination of Predictive Inference and Adaptive Filtering.

PubMed

Kneissler, Jan; Drugowitsch, Jan; Friston, Karl; Butz, Martin V

2015-01-01

Predictive coding appears to be one of the fundamental working principles of brain processing. Amongst other aspects, brains often predict the sensory consequences of their own actions. Predictive coding resembles Kalman filtering, where incoming sensory information is filtered to produce prediction errors for subsequent adaptation and learning. However, to generate prediction errors given motor commands, a suitable temporal forward model is required to generate predictions. While in engineering applications, it is usually assumed that this forward model is known, the brain has to learn it. When filtering sensory input and learning from the residual signal in parallel, a fundamental problem arises: the system can enter a delusional loop when filtering the sensory information using an overly trusted forward model. In this case, learning stalls before accurate convergence because uncertainty about the forward model is not properly accommodated. We present a Bayes-optimal solution to this generic and pernicious problem for the case of linear forward models, which we call Predictive Inference and Adaptive Filtering (PIAF). PIAF filters incoming sensory information and learns the forward model simultaneously. We show that PIAF is formally related to Kalman filtering and to the Recursive Least Squares linear approximation method, but combines these procedures in a Bayes optimal fashion. Numerical evaluations confirm that the delusional loop is precluded and that the learning of the forward model is more than 10-times faster when compared to a naive combination of Kalman filtering and Recursive Least Squares.
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Y.

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Not different, Just Better: The Adaptive Evolution of an Enzyme

DTIC Science & Technology

2015-12-20

ELEMENT NUMBER 5b. GRANT NUMBER 5a. CONTRACT NUMBER Form Approved OMB NO. 0704-0188 3 . DATES COVERED (From - To) - UU UU UU UU 20-12-2015 1-Oct-2011 30...is precisely regulated by allostery and the adaptation of allostery is unknown, and 3 ) multiple experiments by others have demonstrated that adaptive...mutations in the same gene, but replicate populations, functionally parallel? • Aim 3 ) Expression, purification and functional analysis of evolved pyruvate
Specification and Analysis of Parallel Machine Architecture

DTIC Science & Technology

1990-03-17

Parallel Machine Architeture C.V. Ramamoorthy Computer Science Division Dept. of Electrical Engineering and Computer Science University of California...capacity. (4) Adaptive: The overhead in resolution of deadlocks, etc. should be in proportion to their frequency. (5) Avoid rollbacks: Rollbacks can be...snapshots of system state graphically at a rate proportional to simulation time. Some of the examples are as follow: (1) When the simulation clock of
Compiler and Runtime Support for Programming in Adaptive Parallel Environments

DTIC Science & Technology

1998-10-15

noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia
Automatic Adaptation of Tunable Distributed Applications

DTIC Science & Technology

2001-01-01

size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language
Domain Adaptation of Translation Models for Multilingual Applications

DTIC Science & Technology

2009-04-01

expansion effect that corpus (or dictionary ) based trans- lation introduces - however, this effect is maintained even with monolingual query expansion [12...every day; bilingual web pages are harvested as parallel corpora as the quantity of non-English data on the web increases; online dictionaries of...approach is to customize translation models to a domain, by automatically selecting the resources ( dictionaries , parallel corpora) that are best for
Failure of Anisotropic Unstructured Mesh Adaption Based on Multidimensional Residual Minimization

NASA Technical Reports Server (NTRS)

Wood, William A.; Kleb, William L.

2003-01-01

An automated anisotropic unstructured mesh adaptation strategy is proposed, implemented, and assessed for the discretization of viscous flows. The adaption criteria is based upon the minimization of the residual fluctuations of a multidimensional upwind viscous flow solver. For scalar advection, this adaption strategy has been shown to use fewer grid points than gradient based adaption, naturally aligning mesh edges with discontinuities and characteristic lines. The adaption utilizes a compact stencil and is local in scope, with four fundamental operations: point insertion, point deletion, edge swapping, and nodal displacement. Evaluation of the solution-adaptive strategy is performed for a two-dimensional blunt body laminar wind tunnel case at Mach 10. The results demonstrate that the strategy suffers from a lack of robustness, particularly with regard to alignment of the bow shock in the vicinity of the stagnation streamline. In general, constraining the adaption to such a degree as to maintain robustness results in negligible improvement to the solution. Because the present method fails to consistently or significantly improve the flow solution, it is rejected in favor of simple uniform mesh refinement.
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
HALOS: fast, autonomous, holographic adaptive optics

NASA Astrophysics Data System (ADS)

Andersen, Geoff P.; Gelsinger-Austin, Paul; Gaddipati, Ravi; Gaddipati, Phani; Ghebremichael, Fassil

2014-08-01

We present progress on our holographic adaptive laser optics system (HALOS): a compact, closed-loop aberration correction system that uses a multiplexed hologram to deconvolve the phase aberrations in an input beam. The wavefront characterization is based on simple, parallel measurements of the intensity of fixed focal spots and does not require any complex calculations. As such, the system does not require a computer and is thus much cheaper, less complex than conventional approaches. We present details of a fully functional, closed-loop prototype incorporating a 32-element MEMS mirror, operating at a bandwidth of over 10kHz. Additionally, since the all-optical sensing is made in parallel, the speed is independent of actuator number - running at the same bandwidth for one actuator as for a million.
Parallel Programming Strategies for Irregular Adaptive Applications

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.

Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer

NASA Technical Reports Server (NTRS)

Godoy, William F.; Liu, Xu

2011-01-01

General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
A block-based algorithm for the solution of compressible flows in rotor-stator combinations

NASA Technical Reports Server (NTRS)

Akay, H. U.; Ecer, A.; Beskok, A.

1990-01-01

A block-based solution algorithm is developed for the solution of compressible flows in rotor-stator combinations. The method allows concurrent solution of multiple solution blocks in parallel machines. It also allows a time averaged interaction at the stator-rotor interfaces. Numerical results are presented to illustrate the performance of the algorithm. The effect of the interaction between the stator and rotor is evaluated.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Tipireddy, R.; Stinis, P.; Tartakovsky, A. M.

In this paper, we present a novel approach for solving steady-state stochastic partial differential equations (PDEs) with high-dimensional random parameter space. The proposed approach combines spatial domain decomposition with basis adaptation for each subdomain. The basis adaptation is used to address the curse of dimensionality by constructing an accurate low-dimensional representation of the stochastic PDE solution (probability density function and/or its leading statistical moments) in each subdomain. Restricting the basis adaptation to a specific subdomain affords finding a locally accurate solution. Then, the solutions from all of the subdomains are stitched together to provide a global solution. We support ourmore » construction with numerical experiments for a steady-state diffusion equation with a random spatially dependent coefficient. Lastly, our results show that highly accurate global solutions can be obtained with significantly reduced computational costs.« less
Modern spandrels: the roles of genetic drift, gene flow and natural selection in the evolution of parallel clines.

PubMed

Santangelo, James S; Johnson, Marc T J; Ness, Rob W

2018-05-16

Urban environments offer the opportunity to study the role of adaptive and non-adaptive evolutionary processes on an unprecedented scale. While the presence of parallel clines in heritable phenotypic traits is often considered strong evidence for the role of natural selection, non-adaptive evolutionary processes can also generate clines, and this may be more likely when traits have a non-additive genetic basis due to epistasis. In this paper, we use spatially explicit simulations modelled according to the cyanogenesis (hydrogen cyanide, HCN) polymorphism in white clover ( Trifolium repens ) to examine the formation of phenotypic clines along urbanization gradients under varying levels of drift, gene flow and selection. HCN results from an epistatic interaction between two Mendelian-inherited loci. Our results demonstrate that the genetic architecture of this trait makes natural populations susceptible to decreases in HCN frequencies via drift. Gradients in the strength of drift across a landscape resulted in phenotypic clines with lower frequencies of HCN in strongly drifting populations, giving the misleading appearance of deterministic adaptive changes in the phenotype. Studies of heritable phenotypic change in urban populations should generate null models of phenotypic evolution based on the genetic architecture underlying focal traits prior to invoking selection's role in generating adaptive differentiation. © 2018 The Author(s).
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Scaling Optimization of the SIESTA MHD Code

NASA Astrophysics Data System (ADS)

Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

2013-10-01

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Graphics applications utilizing parallel processing

NASA Technical Reports Server (NTRS)

Rice, John R.

1990-01-01

The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
The Development and Assesment of Adaptation Pathways for Urban Pluvial Flooding

NASA Astrophysics Data System (ADS)

Babovic, F.; Mijic, A.; Madani, K.

2017-12-01

Around the globe, urban areas are growing in both size and importance. However, due to the prevalence of impermeable surfaces within the urban fabric of cities these areas have a high risk of pluvial flooding. Due to the convergence of population growth and climate change the risk of pluvial flooding is growing. When designing solutions and adaptations to pluvial flood risk urban planners and engineers encounter a great deal of uncertainty due to model uncertainty, uncertainty within the data utilised, and uncertainty related to future climate and land use conditions. The interaction of these uncertainties leads to conditions of deep uncertainty. However, infrastructure systems must be designed and built in the face of this deep uncertainty. An Adaptation Tipping Points (ATP) methodology was used to develop a strategy to adapt an urban drainage system in the North East of London under conditions of deep uncertainty. The ATP approach was used to assess the current drainage system and potential drainage system adaptations. These adaptations were assessed against potential changes in rainfall depth and peakedness-defined as the ratio of mean to peak rainfall. These solutions encompassed both traditional and blue-green solutions that the Local Authority are known to be considering. This resulted in a set of Adaptation Pathways. However, theses pathways do not convey any information regarding the relative merits and demerits of the potential adaptation options presented. To address this a cost-benefit metric was developed that would reflect the solutions' costs and benefits under uncertainty. The resulting metric combines elements of the Benefits of SuDS Tool (BeST) with real options analysis in order to reflect the potential value of ecosystem services delivered by blue-green solutions under uncertainty. Lastly, it is discussed how a local body can utilise the adaptation pathways; their relative costs and benefits; and a system of local data collection to help guide better decision making with respect to urban flood adaptation.
An FPGA-based DS-CDMA multiuser demodulator employing adaptive multistage parallel interference cancellation

NASA Astrophysics Data System (ADS)

Li, Xinhua; Song, Zhenyu; Zhan, Yongjie; Wu, Qiongzhi

2009-12-01

Since the system capacity is severely limited, reducing the multiple access interfere (MAI) is necessary in the multiuser direct-sequence code division multiple access (DS-CDMA) system which is used in the telecommunication terminals data-transferred link system. In this paper, we adopt an adaptive multistage parallel interference cancellation structure in the demodulator based on the least mean square (LMS) algorithm to eliminate the MAI on the basis of overviewing various of multiuser dectection schemes. Neither a training sequence nor a pilot signal is needed in the proposed scheme, and its implementation complexity can be greatly reduced by a LMS approximate algorithm. The algorithm and its FPGA implementation is then derived. Simulation results of the proposed adaptive PIC can outperform some of the existing interference cancellation methods in AWGN channels. The hardware setup of mutiuser demodulator is described, and the experimental results based on it demonstrate that the simulation results shows large performance gains over the conventional single-user demodulator.
The Parallel Episodic Processing (PEP) model: dissociating contingency and conflict adaptation in the item-specific proportion congruent paradigm.

PubMed

Schmidt, James R

2013-01-01

The present work introduces a computational model, the Parallel Episodic Processing (PEP) model, which demonstrates that contingency learning achieved via simple storage and retrieval of episodic memories can explain the item-specific proportion congruency effect in the colour-word Stroop paradigm. The current work also presents a new experimental procedure to more directly dissociate contingency biases from conflict adaptation (i.e., proportion congruency). This was done with three different types of incongruent words that allow a comparison of: (a) high versus low contingency while keeping proportion congruency constant, and (b) high versus low proportion congruency while keeping contingency constant. Results demonstrated a significant contingency effect, but no effect of proportion congruence. It was further shown that the proportion congruency associated with the colour does not matter, either. Thus, the results quite directly demonstrate that ISPC effects are not due to conflict adaptation, but instead to contingency learning biases. Copyright © 2012 Elsevier B.V. All rights reserved.
A new spherical model for computing the radiation field available for photolysis and heating at twilight

NASA Technical Reports Server (NTRS)

Dahlback, Arne; Stamnes, Knut

1991-01-01

Accurate computation of atmospheric photodissociation and heating rates is needed in photochemical models. These quantities are proportional to the mean intensity of the solar radiation penetrating to various levels in the atmosphere. For large solar zenith angles a solution of the radiative transfer equation valid for a spherical atmosphere is required in order to obtain accurate values of the mean intensity. Such a solution based on a perturbation technique combined with the discrete ordinate method is presented. Mean intensity calculations are carried out for various solar zenith angles. These results are compared with calculations from a plane parallel radiative transfer model in order to assess the importance of using correct geometry around sunrise and sunset. This comparison shows, in agreement with previous investigations, that for solar zenith angles less than 90 deg adequate solutions are obtained for plane parallel geometry as long as spherical geometry is used to compute the direct beam attenuation; but for solar zenith angles greater than 90 deg this pseudospherical plane parallel approximation overstimates the mean intensity.
Crenulation cleavage development by partitioning of deformation into zones of progressive shearing (combined shearing, shortening and volume loss) and progressive shortening (no volume loss): quantification of solution shortening and intermicrolithon-movement

NASA Astrophysics Data System (ADS)

Stewart, L. K.

1997-11-01

An analytical method for determining amounts of cleavage-normal dissolution and cleavage-parallel shear movement that occurred between adjacent microlithons during crenulation cleavage seam formation within a deformed slate is developed for the progressive bulk inhomogeneous shortening (PBIS) mechanism of crenulation cleavage formation. The method utilises structural information obtained from samples where a diverging bed and vein are offset by a crenulation cleavage seam. Several samples analysed using this method produced ratios of relative, cleavage-parallel movement of microlithons to the material thickness removed by dissolution typically in the range of 1.1-3.4:1. The mean amount of solution shortening attributed to the formation of the cleavage seams examined is 24%. The results indicate that a relationship may exist between the width of microlithons and the amount of cleavage-parallel intermicrolithon-movement. The method presented here has the potential to help determine whether crenulation cleavage seams formed by the progressive bulk inhomogeneous shortening mechanism or by that involving cleavage-normal pressure solution alone.
Multidisciplinary design optimization for sonic boom mitigation

NASA Astrophysics Data System (ADS)

Ozcer, Isik A.

Automated, parallelized, time-efficient surface definition and grid generation and flow simulation methods are developed for sharp and accurate sonic boom signal computation in three dimensions in the near and mid-field of an aircraft using Euler/Full-Potential unstructured/structured computational fluid dynamics. The full-potential mid-field sonic boom prediction code is an accurate and efficient solver featuring automated grid generation, grid adaptation and shock fitting, and parallel processing. This program quickly marches the solution using a single nonlinear equation for large distances that cannot be covered with Euler solvers due to large memory and long computational time requirements. The solver takes into account variations in temperature and pressure with altitude. The far-field signal prediction is handled using the classical linear Thomas Waveform Parameter Method where the switching altitude from the nonlinear to linear prediction is determined by convergence of the ground signal pressure impulse value. This altitude is determined as r/L ≈ 10 from the source for a simple lifting wing, and r/L ≈ 40 for a real complex aircraft. Unstructured grid adaptation and shock fitting methodology developed for the near-field analysis employs an Hessian based anisotropic grid adaptation based on error equidistribution. A special field scalar is formulated to be used in the computation of the Hessian based error metric which enhances significantly the adaptation scheme for shocks. The entire cross-flow of a complex aircraft is resolved with high fidelity using only 500,000 grid nodes after only about 10 solution/adaptation cycles. Shock fitting is accomplished using Roe's Flux-Difference Splitting scheme which is an approximate Riemann type solver and by proper alignment of the cell faces with respect to shock surfaces. Simple to complex real aircraft geometries are handled with no user-interference required making the simulation methods suitable tools for product design. The simulation tools are used to optimize three geometries for sonic boom mitigation. The first is a simple axisymmetric shape to be used as a generic nose component, the second is a delta wing with lift, and the third is a real aircraft with nose and wing optimization. The objectives are to minimize the pressure impulse or the peak pressure in the sonic boom signal, while keeping the drag penalty under feasible limits. The design parameters for the meridian profile of the nose shape are the lengths and the half-cone angles of the linear segments that make up the profile. The design parameters for the lifting wing are the dihedral angle, angle of attack, non-linear span-wise twist and camber distribution. The test-bed aircraft is the modified F-5E aircraft built by Northrop Grumman, designated the Shaped Sonic Boom Demonstrator. This aircraft is fitted with an optimized axisymmetric nose, and the wings are optimized to demonstrate optimization for sonic boom mitigation for a real aircraft. The final results predict 42% reduction in bow shock strength, 17% reduction in peak Deltap, 22% reduction in pressure impulse, 10% reduction in foot print size, 24% reduction in inviscid drag, and no loss in lift for the optimized aircraft. Optimization is carried out using response surface methodology, and the design matrices are determined using standard DoE techniques for quadratic response modeling.
Asymptotic Linearity of Optimal Control Modification Adaptive Law with Analytical Stability Margins

NASA Technical Reports Server (NTRS)

Nguyen, Nhan T.

2010-01-01

Optimal control modification has been developed to improve robustness to model-reference adaptive control. For systems with linear matched uncertainty, optimal control modification adaptive law can be shown by a singular perturbation argument to possess an outer solution that exhibits a linear asymptotic property. Analytical expressions of phase and time delay margins for the outer solution can be obtained. Using the gradient projection operator, a free design parameter of the adaptive law can be selected to satisfy stability margins.
Research of the effectiveness of parallel multithreaded realizations of interpolation methods for scaling raster images

NASA Astrophysics Data System (ADS)

Vnukov, A. A.; Shershnev, M. B.

2018-01-01

The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Hierarchical Compliance Control of a Soft Ankle Rehabilitation Robot Actuated by Pneumatic Muscles.

PubMed

Liu, Quan; Liu, Aiming; Meng, Wei; Ai, Qingsong; Xie, Sheng Q

2017-01-01

Traditional compliance control of a rehabilitation robot is implemented in task space by using impedance or admittance control algorithms. The soft robot actuated by pneumatic muscle actuators (PMAs) is becoming prominent for patients as it enables the compliance being adjusted in each active link, which, however, has not been reported in the literature. This paper proposes a new compliance control method of a soft ankle rehabilitation robot that is driven by four PMAs configured in parallel to enable three degrees of freedom movement of the ankle joint. A new hierarchical compliance control structure, including a low-level compliance adjustment controller in joint space and a high-level admittance controller in task space, is designed. An adaptive compliance control paradigm is further developed by taking into account patient's active contribution and movement ability during a previous period of time, in order to provide robot assistance only when it is necessarily required. Experiments on healthy and impaired human subjects were conducted to verify the adaptive hierarchical compliance control scheme. The results show that the robot hierarchical compliance can be online adjusted according to the participant's assessment. The robot reduces its assistance output when participants contribute more and vice versa , thus providing a potentially feasible solution to the patient-in-loop cooperative training strategy.
Hierarchical Compliance Control of a Soft Ankle Rehabilitation Robot Actuated by Pneumatic Muscles

PubMed Central

Liu, Quan; Liu, Aiming; Meng, Wei; Ai, Qingsong; Xie, Sheng Q.

2017-01-01

Traditional compliance control of a rehabilitation robot is implemented in task space by using impedance or admittance control algorithms. The soft robot actuated by pneumatic muscle actuators (PMAs) is becoming prominent for patients as it enables the compliance being adjusted in each active link, which, however, has not been reported in the literature. This paper proposes a new compliance control method of a soft ankle rehabilitation robot that is driven by four PMAs configured in parallel to enable three degrees of freedom movement of the ankle joint. A new hierarchical compliance control structure, including a low-level compliance adjustment controller in joint space and a high-level admittance controller in task space, is designed. An adaptive compliance control paradigm is further developed by taking into account patient’s active contribution and movement ability during a previous period of time, in order to provide robot assistance only when it is necessarily required. Experiments on healthy and impaired human subjects were conducted to verify the adaptive hierarchical compliance control scheme. The results show that the robot hierarchical compliance can be online adjusted according to the participant’s assessment. The robot reduces its assistance output when participants contribute more and vice versa, thus providing a potentially feasible solution to the patient-in-loop cooperative training strategy. PMID:29255412
A DFT study of the stability of SIAs and small SIA clusters in the vicinity of solute atoms in Fe

NASA Astrophysics Data System (ADS)

Becquart, C. S.; Ngayam Happy, R.; Olsson, P.; Domain, C.

2018-03-01

The energetics, defect volume and magnetic properties of single SIAs and small SIA clusters up to size 6 have been calculated by DFT for different configurations like the parallel 〈110〉 dumbbell, the non parallel 〈110〉 dumbbell and the C15 structure. The most stable configurations of each type have been further analyzed to determine the influence on their stability of various solute atoms (Ti, V, Cr, Mn, Co, Ni, Cu, Mo, W, Pd, Al, Si, P), relevant for steels used under irradiation. The results show that the presence of solute atoms does not change the relative stability order among SIA clusters. The small SIA clusters investigated can bind to both undersized and oversized solutes. Several descriptors have been considered to derive interesting trends from results. It appears that the local atomic volume available for the solute is the main physical quantity governing the binding energy evolution, whatever the solute type (undersized or oversized) and the cluster configuration (size and type).

Kinematics of an in-parallel actuated manipulator based on the Stewart platform mechanism

NASA Technical Reports Server (NTRS)

Williams, Robert L., II

1992-01-01

This paper presents kinematic equations and solutions for an in-parallel actuated robotic mechanism based on Stewart's platform. These equations are required for inverse position and resolved rate (inverse velocity) platform control. NASA LaRC has a Vehicle Emulator System (VES) platform designed by MIT which is based on Stewart's platform. The inverse position solution is straight-forward and computationally inexpensive. Given the desired position and orientation of the moving platform with respect to the base, the lengths of the prismatic leg actuators are calculated. The forward position solution is more complicated and theoretically has 16 solutions. The position and orientation of the moving platform with respect to the base is calculated given the leg actuator lengths. Two methods are pursued in this paper to solve this problem. The resolved rate (inverse velocity) solution is derived. Given the desired Cartesian velocity of the end-effector, the required leg actuator rates are calculated. The Newton-Raphson Jacobian matrix resulting from the second forward position kinematics solution is a modified inverse Jacobian matrix. Examples and simulations are given for the VES.
Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Paul T.; Shadid, John N.; Sala, Marzio

In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Space-time mesh adaptation for solute transport in randomly heterogeneous porous media.

PubMed

Dell'Oca, Aronne; Porta, Giovanni Michele; Guadagnini, Alberto; Riva, Monica

2018-05-01

We assess the impact of an anisotropic space and time grid adaptation technique on our ability to solve numerically solute transport in heterogeneous porous media. Heterogeneity is characterized in terms of the spatial distribution of hydraulic conductivity, whose natural logarithm, Y, is treated as a second-order stationary random process. We consider nonreactive transport of dissolved chemicals to be governed by an Advection Dispersion Equation at the continuum scale. The flow field, which provides the advective component of transport, is obtained through the numerical solution of Darcy's law. A suitable recovery-based error estimator is analyzed to guide the adaptive discretization. We investigate two diverse strategies guiding the (space-time) anisotropic mesh adaptation. These are respectively grounded on the definition of the guiding error estimator through the spatial gradients of: (i) the concentration field only; (ii) both concentration and velocity components. We test the approach for two-dimensional computational scenarios with moderate and high levels of heterogeneity, the latter being expressed in terms of the variance of Y. As quantities of interest, we key our analysis towards the time evolution of section-averaged and point-wise solute breakthrough curves, second centered spatial moment of concentration, and scalar dissipation rate. As a reference against which we test our results, we consider corresponding solutions associated with uniform space-time grids whose level of refinement is established through a detailed convergence study. We find a satisfactory comparison between results for the adaptive methodologies and such reference solutions, our adaptive technique being associated with a markedly reduced computational cost. Comparison of the two adaptive strategies tested suggests that: (i) defining the error estimator relying solely on concentration fields yields some advantages in grasping the key features of solute transport taking place within low velocity regions, where diffusion-dispersion mechanisms are dominant; and (ii) embedding the velocity field in the error estimator guiding strategy yields an improved characterization of the forward fringe of solute fronts which propagate through high velocity regions. Copyright © 2017 Elsevier B.V. All rights reserved.
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.

2016-01-01

MADNESS (multiresolution adaptive numerical environment for scientific simulation) is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
Reactive Transport Modeling of Induced Calcite Precipitation Reaction Fronts in Porous Media Using A Parallel, Fully Coupled, Fully Implicit Approach

NASA Astrophysics Data System (ADS)

Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.

2010-12-01

Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
Biomimetics on seed dispersal: survey and insights for space exploration.

PubMed

Pandolfi, Camilla; Izzo, Dario

2013-06-01

Seeds provide the vital genetic link and dispersal agent between successive generations of plants. Without seed dispersal as a means of reproduction, many plants would quickly die out. Because plants lack any sort of mobility and remain in the same spot for their entire lives, they rely on seed dispersal to transport their offspring throughout the environment. This can be accomplished either collectively or individually; in any case as seeds ultimately abdicate their movement, they are at the mercy of environmental factors. Thus, seed dispersal strategies are characterized by robustness, adaptability, intelligence (both behavioral and morphological), and mass and energy efficiency (including the ability to utilize environmental sources of energy available): all qualities that advanced engineering systems aim at in general, and in particular those that need to enable complex endeavors such as space exploration. Plants evolved and adapted their strategy according to their environment, and taken together, they enclose many desirable characteristics that a space mission needs to have. Understanding in detail how plants control the development of seeds, fabricate structural components for their dispersal, build molecular machineries to keep seeds dormant up to the right moment and monitor the environment to release them at the right time could provide several solutions impacting current space mission design practices. It can lead to miniaturization, higher integration and packing efficiency, energy efficiency and higher autonomy and robustness. Consequently, there would appear to be good reasons for considering biomimetic solutions from plant kingdom when designing space missions, especially to other celestial bodies, where solid and liquid surfaces, atmosphere, etc constitute and are obviously parallel with the terrestrial environment where plants evolved. In this paper, we review the current state of biomimetics on seed dispersal to improve space mission design.
Military Curricula for Vocational & Technical Education. Basic Electricity and Electronics Individualized Learning System. CANTRAC A-100-0010. Module Six: Parallel Circuits. Study Booklet.

ERIC Educational Resources Information Center

Chief of Naval Education and Training Support, Pensacola, FL.

This individualized learning module on parallel circuits is one in a series of modules for a course in basic electricity and electronics. The course is one of a number of military-developed curriculum packages selected for adaptation to vocational instructional and curriculum development in a civilian setting. Four lessons are included in the…
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

NASA Astrophysics Data System (ADS)

Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

2016-10-01

A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Reactive silica transport in fractured porous media: Analytical solutions for a system of parallel fractures

NASA Astrophysics Data System (ADS)

Yang, Jianwen

2012-04-01

A general analytical solution is derived by using the Laplace transformation to describe transient reactive silica transport in a conceptualized 2-D system involving a set of parallel fractures embedded in an impermeable host rock matrix, taking into account of hydrodynamic dispersion and advection of silica transport along the fractures, molecular diffusion from each fracture to the intervening rock matrix, and dissolution of quartz. A special analytical solution is also developed by ignoring the longitudinal hydrodynamic dispersion term but remaining other conditions the same. The general and special solutions are in the form of a double infinite integral and a single infinite integral, respectively, and can be evaluated using Gauss-Legendre quadrature technique. A simple criterion is developed to determine under what conditions the general analytical solution can be approximated by the special analytical solution. It is proved analytically that the general solution always lags behind the special solution, unless a dimensionless parameter is less than a critical value. Several illustrative calculations are undertaken to demonstrate the effect of fracture spacing, fracture aperture and fluid flow rate on silica transport. The analytical solutions developed here can serve as a benchmark to validate numerical models that simulate reactive mass transport in fractured porous media.
A solution-adaptive hybrid-grid method for the unsteady analysis of turbomachinery

NASA Technical Reports Server (NTRS)

Mathur, Sanjay R.; Madavan, Nateri K.; Rajagopalan, R. G.

1993-01-01

A solution-adaptive method for the time-accurate analysis of two-dimensional flows in turbomachinery is described. The method employs a hybrid structured-unstructured zonal grid topology in conjunction with appropriate modeling equations and solution techniques in each zone. The viscous flow region in the immediate vicinity of the airfoils is resolved on structured O-type grids while the rest of the domain is discretized using an unstructured mesh of triangular cells. Implicit, third-order accurate, upwind solutions of the Navier-Stokes equations are obtained in the inner regions. In the outer regions, the Euler equations are solved using an explicit upwind scheme that incorporates a second-order reconstruction procedure. An efficient and robust grid adaptation strategy, including both grid refinement and coarsening capabilities, is developed for the unstructured grid regions. Grid adaptation is also employed to facilitate information transfer at the interfaces between unstructured grids in relative motion. Results for grid adaptation to various features pertinent to turbomachinery flows are presented. Good comparisons between the present results and experimental measurements and earlier structured-grid results are obtained.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Comparing Anisotropic Output-Based Grid Adaptation Methods by Decomposition

NASA Technical Reports Server (NTRS)

Park, Michael A.; Loseille, Adrien; Krakos, Joshua A.; Michal, Todd

2015-01-01

Anisotropic grid adaptation is examined by decomposing the steps of flow solution, ad- joint solution, error estimation, metric construction, and simplex grid adaptation. Multiple implementations of each of these steps are evaluated by comparison to each other and expected analytic results when available. For example, grids are adapted to analytic metric fields and grid measures are computed to illustrate the properties of multiple independent implementations of grid adaptation mechanics. Different implementations of each step in the adaptation process can be evaluated in a system where the other components of the adaptive cycle are fixed. Detailed examination of these properties allows comparison of different methods to identify the current state of the art and where further development should be targeted.
Transcriptomic imprints of adaptation to fresh water: parallel evolution of osmoregulatory gene expression in the Alewife

USGS Publications Warehouse

Velotta, Jonathan P.; Wegrzyn, Jill L.; Ginzburg, Samuel; Kang, Lin; Czesny, Sergiusz J.; O'Neill, Rachel J.; McCormick, Stephen; Michalak, Pawel; Schultz, Eric T.

2017-01-01

Comparative approaches in physiological genomics offer an opportunity to understand the functional importance of genes involved in niche exploitation. We used populations of Alewife (Alosa pseudoharengus) to explore the transcriptional mechanisms that underlie adaptation to fresh water. Ancestrally anadromous Alewives have recently formed multiple, independently derived, landlocked populations, which exhibit reduced tolerance of saltwater and enhanced tolerance of fresh water. Using RNA-seq, we compared transcriptional responses of an anadromous Alewife population to two landlocked populations after acclimation to fresh (0 ppt) and saltwater (35 ppt). Our results suggest that the gill transcriptome has evolved in primarily discordant ways between independent landlocked populations and their anadromous ancestor. By contrast, evolved shifts in the transcription of a small suite of well-characterized osmoregulatory genes exhibited a strong degree of parallelism. In particular, transcription of genes that regulate gill ion exchange has diverged in accordance with functional predictions: freshwater ion-uptake genes (most notably, the ‘freshwater paralog’ of Na+/K+-ATPase α-subunit) were more highly expressed in landlocked forms, whereas genes that regulate saltwater ion secretion (e.g. the ‘saltwater paralog’ of NKAα) exhibited a blunted response to saltwater. Parallel divergence of ion transport gene expression is associated with shifts in salinity tolerance limits among landlocked forms, suggesting that changes to the gill's transcriptional response to salinity facilitate freshwater adaptation.
Parallel Implementation of Numerical Solution of Few-Body Problem Using Feynman's Continual Integrals

NASA Astrophysics Data System (ADS)

Naumenko, Mikhail; Samarin, Viacheslav

2018-02-01

Modern parallel computing algorithm has been applied to the solution of the few-body problem. The approach is based on Feynman's continual integrals method implemented in C++ programming language using NVIDIA CUDA technology. A wide range of 3-body and 4-body bound systems has been considered including nuclei described as consisting of protons and neutrons (e.g., 3,4He) and nuclei described as consisting of clusters and nucleons (e.g., 6He). The correctness of the results was checked by the comparison with the exactly solvable 4-body oscillatory system and experimental data.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
An asymptotic induced numerical method for the convection-diffusion-reaction equation

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.; Sorensen, Danny C.

1988-01-01

A parallel algorithm for the efficient solution of a time dependent reaction convection diffusion equation with small parameter on the diffusion term is presented. The method is based on a domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. Parallelism is evident at two levels. Domain decomposition provides parallelism at the highest level, and within each domain there is ample opportunity to exploit parallelism. Run time results demonstrate the viability of the method.
Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data

DOEpatents

Grider, Gary A.; Poole, Stephen W.

2015-09-01

Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
Parallel Digital Phase-Locked Loops

NASA Technical Reports Server (NTRS)

Sadr, Ramin; Shah, Biren N.; Hinedi, Sami M.

1995-01-01

Wide-band microwave receivers of proposed type include digital phase-locked loops in which band-pass filtering and down-conversion of input signals implemented by banks of multirate digital filters operating in parallel. Called "parallel digital phase-locked loops" to distinguish them from other digital phase-locked loops. Systems conceived as cost-effective solution to problem of filtering signals at high sampling rates needed to accommodate wide input frequency bands. Each of M filters process 1/M of spectrum of signal.
An asymptotic-preserving Lagrangian algorithm for the time-dependent anisotropic heat transport equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.

2014-09-01

We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less

A bioconvection model for a squeezing flow of nanofluid between parallel plates in the presence of gyrotactic microorganisms

NASA Astrophysics Data System (ADS)

Bin-Mohsin, Bandar; Ahmed, Naveed; Adnan; Khan, Umar; Tauseef Mohyud-Din, Syed

2017-04-01

This article deals with the bioconvection flow in a parallel-plate channel. The plates are parallel and the flowing fluid is saturated with nanoparticles, and water is considered as a base fluid because microorganisms can survive only in water. A highly nonlinear and coupled system of partial differential equations presenting the model of bioconvection flow between parallel plates is reduced to a nonlinear and coupled system (nondimensional bioconvection flow model) of ordinary differential equations with the help of feasible nondimensional variables. In order to find the convergent solution of the system, a semi-analytical technique is utilized called variation of parameters method (VPM). Numerical solution is also computed and the Runge-Kutta scheme of fourth order is employed for this purpose. Comparison between these solutions has been made on the domain of interest and found to be in excellent agreement. Also, influence of various parameters has been discussed for the nondimensional velocity, temperature, concentration and density of the motile microorganisms both for suction and injection cases. Almost inconsequential influence of thermophoretic and Brownian motion parameters on the temperature field is observed. An interesting variation are inspected for the density of the motile microorganisms due to the varying bioconvection parameter in suction and injection cases. At the end, we make some concluding remarks in the light of this article.
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

PubMed Central

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

PubMed

Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
Inactivation of Heat Adapted and Chlorine Adapted Listeria Monocytogenes ATCC 7644 on Tomatoes Using Sodium Dodecyl Sulphate, Levulinic Acid and Sodium Hypochlorite Solution.

PubMed

Ijabadeniyi, Oluwatosin Ademola; Mnyandu, Elizabeth

2017-04-13

The effectiveness of sodium dodecyl sulphate (SDS), sodium hypochlorite solution and levulinic acid in reducing the survival of heat adapted and chlorine adapted Listeria monocytogenes ATCC 7644 was evaluated. The results against heat adapted L. monocytognes revealed that sodium hypochlorite solution was the least effective, achieving log reduction of 2.75, 2.94 and 3.97 log colony forming unit (CFU)/mL for 1, 3 and 5 minutes, respectively. SDS was able to achieve 8 log reduction for both heat adapted and chlorine adapted bacteria. When used against chlorine adapted L. monocytogenes sodium hypochlorite solution achieved log reduction of 2.76, 2.93 and 3.65 log CFU/mL for 1, 3 and 5 minutes, respectively. Using levulinic acid on heat adapted bacteria achieved log reduction of 3.07, 2.78 and 4.97 log CFU/mL for 1, 3, 5 minutes, respectively. On chlorine adapted bacteria levulinic acid achieved log reduction of 2.77, 3.07 and 5.21 log CFU/mL for 1, 3 and 5 minutes, respectively. Using a mixture of 0.05% SDS and 0.5% levulinic acid on heat adapted bacteria achieved log reduction of 3.13, 3.32 and 4.79 log CFU/mL for 1, 3 and 5 minutes while on chlorine adapted bacteria it achieved 3.20, 3.33 and 5.66 log CFU/mL, respectively. Increasing contact time also increased log reduction for both test pathogens. A storage period of up to 72 hours resulted in progressive log reduction for both test pathogens. Results also revealed that there was a significant difference (P≤0.05) among contact times, storage times and sanitizers. Findings from this study can be used to select suitable sanitizers and contact times for heat and chlorine adapted L. monocytogenes in the fresh produce industry.
An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed

NASA Astrophysics Data System (ADS)

Kim, H.; Choi, Y.; Yang, Y.

In this paper a state-of-the-art FPGA (Field Programmable Gate Array) based high speed parallel signal processing system (SPS) for adaptive optics (AO) testbed with 1 kHz wavefront error (WFE) correction frequency is reported. The AO system consists of Shack-Hartmann sensor (SHS) and deformable mirror (DM), tip-tilt sensor (TTS), tip-tilt mirror (TTM) and an FPGA-based high performance SPS to correct wavefront aberrations. The SHS is composed of 400 subapertures and the DM 277 actuators with Fried geometry, requiring high speed parallel computing capability SPS. In this study, the target WFE correction speed is 1 kHz; therefore, it requires massive parallel computing capabilities as well as strict hard real time constraints on measurements from sensors, matrix computation latency for correction algorithms, and output of control signals for actuators. In order to meet them, an FPGA based real-time SPS with parallel computing capabilities is proposed. In particular, the SPS is made up of a National Instrument's (NI's) real time computer and five FPGA boards based on state-of-the-art Xilinx Kintex 7 FPGA. Programming is done with NI's LabView environment, providing flexibility when applying different algorithms for WFE correction. It also facilitates faster programming and debugging environment as compared to conventional ones. One of the five FPGA's is assigned to measure TTS and calculate control signals for TTM, while the rest four are used to receive SHS signal, calculate slops for each subaperture and correction signal for DM. With this parallel processing capabilities of the SPS the overall closed-loop WFE correction speed of 1 kHz has been achieved. System requirements, architecture and implementation issues are described; furthermore, experimental results are also given.
Analysis of composite ablators using massively parallel computation

NASA Technical Reports Server (NTRS)

Shia, David

1995-01-01

In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
Improve load balancing and coding efficiency of tiles in high efficiency video coding by adaptive tile boundary

NASA Astrophysics Data System (ADS)

Chan, Chia-Hsin; Tu, Chun-Chuan; Tsai, Wen-Jiin

2017-01-01

High efficiency video coding (HEVC) not only improves the coding efficiency drastically compared to the well-known H.264/AVC but also introduces coding tools for parallel processing, one of which is tiles. Tile partitioning is allowed to be arbitrary in HEVC, but how to decide tile boundaries remains an open issue. An adaptive tile boundary (ATB) method is proposed to select a better tile partitioning to improve load balancing (ATB-LoadB) and coding efficiency (ATB-Gain) with a unified scheme. Experimental results show that, compared to ordinary uniform-space partitioning, the proposed ATB can save up to 17.65% of encoding times in parallel encoding scenarios and can reduce up to 0.8% of total bit rates for coding efficiency.
EOS: A project to investigate the design and construction of real-time distributed Embedded Operating Systems

NASA Technical Reports Server (NTRS)

Campbell, R. H.; Essick, Ray B.; Johnston, Gary; Kenny, Kevin; Russo, Vince

1987-01-01

Project EOS is studying the problems of building adaptable real-time embedded operating systems for the scientific missions of NASA. Choices (A Class Hierarchical Open Interface for Custom Embedded Systems) is an operating system designed and built by Project EOS to address the following specific issues: the software architecture for adaptable embedded parallel operating systems, the achievement of high-performance and real-time operation, the simplification of interprocess communications, the isolation of operating system mechanisms from one another, and the separation of mechanisms from policy decisions. Choices is written in C++ and runs on a ten processor Encore Multimax. The system is intended for use in constructing specialized computer applications and research on advanced operating system features including fault tolerance and parallelism.
Adaptation of a Multi-Block Structured Solver for Effective Use in a Hybrid CPU/GPU Massively Parallel Environment

NASA Astrophysics Data System (ADS)

Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain

2014-11-01

Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
Three dimensional adaptive mesh refinement on a spherical shell for atmospheric models with lagrangian coordinates

NASA Astrophysics Data System (ADS)

Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael

2007-07-01

One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
Ecological adaptation of diverse honey bee (Apis mellifera) populations.

PubMed

Parker, Robert; Melathopoulos, Andony P; White, Rick; Pernal, Stephen F; Guarna, M Marta; Foster, Leonard J

2010-06-15

Honey bees are complex eusocial insects that provide a critical contribution to human agricultural food production. Their natural migration has selected for traits that increase fitness within geographical areas, but in parallel their domestication has selected for traits that enhance productivity and survival under local conditions. Elucidating the biochemical mechanisms of these local adaptive processes is a key goal of evolutionary biology. Proteomics provides tools unique among the major 'omics disciplines for identifying the mechanisms employed by an organism in adapting to environmental challenges. Through proteome profiling of adult honey bee midgut from geographically dispersed, domesticated populations combined with multiple parallel statistical treatments, the data presented here suggest some of the major cellular processes involved in adapting to different climates. These findings provide insight into the molecular underpinnings that may confer an advantage to honey bee populations. Significantly, the major energy-producing pathways of the mitochondria, the organelle most closely involved in heat production, were consistently higher in bees that had adapted to colder climates. In opposition, up-regulation of protein metabolism capacity, from biosynthesis to degradation, had been selected for in bees from warmer climates. Overall, our results present a proteomic interpretation of expression polymorphisms between honey bee ecotypes and provide insight into molecular aspects of local adaptation or selection with consequences for honey bee management and breeding. The implications of our findings extend beyond apiculture as they underscore the need to consider the interdependence of animal populations and their agro-ecological context.
Algorithms for parallel and vector computations

NASA Technical Reports Server (NTRS)

Ortega, James M.

1995-01-01

This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Waveform inversion for 3-D earth structure using the Direct Solution Method implemented on vector-parallel supercomputer

NASA Astrophysics Data System (ADS)

Hara, Tatsuhiko

2004-08-01

We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
Hybrid Solution-Adaptive Unstructured Cartesian Method for Large-Eddy Simulation of Detonation in Multi-Phase Turbulent Reactive Mixtures

DTIC Science & Technology

2012-03-27

pulse- detonation engines ( PDE ), stage separation, supersonic cav- ity oscillations, hypersonic aerodynamics, detonation induced structural...ADAPTIVE UNSTRUCTURED CARTESIAN METHOD FOR LARGE-EDDY SIMULATION OF DETONATION IN MULTI-PHASE TURBULENT REACTIVE MIXTURES 5b. GRANT NUMBER FA9550...CCL Report TR-2012-03-03 Hybrid Solution-Adaptive Unstructured Cartesian Method for Large-Eddy Simulation of Detonation in Multi-Phase Turbulent
On operational flood forecasting system involving 1D/2D coupled hydraulic model and data assimilation

NASA Astrophysics Data System (ADS)

Barthélémy, S.; Ricci, S.; Morel, T.; Goutal, N.; Le Pape, E.; Zaoui, F.

2018-07-01

In the context of hydrodynamic modeling, the use of 2D models is adapted in areas where the flow is not mono-dimensional (confluence zones, flood plains). Nonetheless the lack of field data and the computational cost constraints limit the extensive use of 2D models for operational flood forecasting. Multi-dimensional coupling offers a solution with 1D models where the flow is mono-dimensional and with local 2D models where needed. This solution allows for the representation of complex processes in 2D models, while the simulated hydraulic state is significantly better than that of the full 1D model. In this study, coupling is implemented between three 1D sub-models and a local 2D model for a confluence on the Adour river (France). A Schwarz algorithm is implemented to guarantee the continuity of the variables at the 1D/2D interfaces while in situ observations are assimilated in the 1D sub-models to improve results and forecasts in operational mode as carried out by the French flood forecasting services. An implementation of the coupling and data assimilation (DA) solution with domain decomposition and task/data parallelism is proposed so that it is compatible with operational constraints. The coupling with the 2D model improves the simulated hydraulic state compared to a global 1D model, and DA improves results in 1D and 2D areas.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

NASA Astrophysics Data System (ADS)

Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

2011-03-01

A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.