interactive parallel programming: Topics by Science.gov

Sample records for interactive parallel programming

An interactive parallel programming environment applied in atmospheric science

NASA Technical Reports Server (NTRS)

vonLaszewski, G.

1996-01-01

This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Cellular automata with object-oriented features for parallel molecular network modeling.

PubMed

Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

2005-06-01

Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Busbey, A.B.

Seismic Processing Workshop, a program by Parallel Geosciences of Austin, TX, is discussed in this column. The program is a high-speed, interactive seismic processing and computer analysis system for the Apple Macintosh II family of computers. Also reviewed in this column are three products from Wilkerson Associates of Champaign, IL. SubSide is an interactive program for basin subsidence analysis; MacFault and MacThrustRamp are programs for modeling faults.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
An Empirical Development of Parallelization Guidelines for Time-Driven Simulation

DTIC Science & Technology

1989-12-01

wives, who though not Cub fans, put on a good show during our trip, to waich some games . I would also like to recognize the help of my professors at...program parallelization. in this research effort a Ballistic Missile Defense (BMD) time driven simulation program, developed by DESE Research and...continuously, or continuously with discrete changes superimposed. The distinguishing feature of these simulations is the interaction between discretely
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Computer-Aided Parallelizer and Optimizer

NASA Technical Reports Server (NTRS)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
A CS1 pedagogical approach to parallel thinking

NASA Astrophysics Data System (ADS)

Rague, Brian William

Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of any initial three-week CS1 level module when compared with student comprehension levels just prior to starting the course. Survey results measured during the ninth week of the course reveal that performance levels remained high compared to pre-course performance scores. A second result produced by this study reveals no statistically significant interaction effect between the intervention method and student performance as measured by the evaluation instrument over three separate testing periods. However, visual inspection of survey score trends and the low p-value generated by the interaction analysis (0.062) indicate that further studies may verify improved concept retention levels for the lecture w/PAT group.
CRITIC2: A program for real-space analysis of quantum chemical interactions in solids

NASA Astrophysics Data System (ADS)

Otero-de-la-Roza, A.; Johnson, Erin R.; Luaña, Víctor

2014-03-01

We present CRITIC2, a program for the analysis of quantum-mechanical atomic and molecular interactions in periodic solids. This code, a greatly improved version of the previous CRITIC program (Otero-de-la Roza et al., 2009), can: (i) find critical points of the electron density and related scalar fields such as the electron localization function (ELF), Laplacian, … (ii) integrate atomic properties in the framework of Bader’s Atoms-in-Molecules theory (QTAIM), (iii) visualize non-covalent interactions in crystals using the non-covalent interactions (NCI) index, (iv) generate relevant graphical representations including lines, planes, gradient paths, contour plots, atomic basins, … and (v) perform transformations between file formats describing scalar fields and crystal structures. CRITIC2 can interface with the output produced by a variety of electronic structure programs including WIEN2k, elk, PI, abinit, Quantum ESPRESSO, VASP, Gaussian, and, in general, any other code capable of writing the scalar field under study to a three-dimensional grid. CRITIC2 is parallelized, completely documented (including illustrative test cases) and publicly available under the GNU General Public License. Catalogue identifier: AECB_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECB_v2_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: yes No. of lines in distributed program, including test data, etc.: 11686949 No. of bytes in distributed program, including test data, etc.: 337020731 Distribution format: tar.gz Programming language: Fortran 77 and 90. Computer: Workstations. Operating system: Unix, GNU/Linux. Has the code been vectorized or parallelized?: Shared-memory parallelization can be used for most tasks. Classification: 7.3. Catalogue identifier of previous version: AECB_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 157 Nature of problem: Analysis of quantum-chemical interactions in periodic solids by means of atoms-in-molecules and related formalisms. Solution method: Critical point search using Newton’s algorithm, atomic basin integration using bisection, qtree and grid-based algorithms, diverse graphical representations and computation of the non-covalent interactions index on a three-dimensional grid. Additional comments: !!!!! The distribution file for this program is over 330 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. !!!!! Running time: Variable, depending on the crystal and the source of the underlying scalar field.

Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

NASA Astrophysics Data System (ADS)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

PubMed

Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

2014-06-25

The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions

PubMed Central

2014-01-01

Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
VISUAL AND AUDIO PRESENTATION IN MACHINE PROGRAMED INSTRUCTION. FINAL REPORT.

ERIC Educational Resources Information Center

ALLEN, WILLIAM H.

THIS STUDY WAS PART OF A LARGER RESEARCH PROGRAM AIMED TOWARD DEVELOPMENT OF PARADIGMS OF MESSAGE DESIGN. OBJECTIVES OF THREE PARALLEL EXPERIMENTS WERE TO EVALUATE INTERACTIONS OF PRESENTATION MODE, PROGRAM TYPE, AND CONTENT AS THEY AFFECT LEARNER CHARACTERISTICS. EACH EXPERIMENT USED 18 TREATMENTS IN A FACTORIAL DESIGN WITH RANDOMLY SELECTED…
PREMER: a Tool to Infer Biological Networks.

PubMed

Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

2017-10-04

Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).
LAMMPS framework for dynamic bonding and an application modeling DNA

NASA Astrophysics Data System (ADS)

Svaneborg, Carsten

2012-08-01

We have extended the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) to support directional bonds and dynamic bonding. The framework supports stochastic formation of new bonds, breakage of existing bonds, and conversion between bond types. Bond formation can be controlled to limit the maximal functionality of a bead with respect to various bond types. Concomitant with the bond dynamics, angular and dihedral interactions are dynamically introduced between newly connected triplets and quartets of beads, where the interaction type is determined from the local pattern of bead and bond types. When breaking bonds, all angular and dihedral interactions involving broken bonds are removed. The framework allows chemical reactions to be modeled, and use it to simulate a simplistic, coarse-grained DNA model. The resulting DNA dynamics illustrates the power of the present framework. Catalogue identifier: AEME_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEME_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public Licence No. of lines in distributed program, including test data, etc.: 2 243 491 No. of bytes in distributed program, including test data, etc.: 771 Distribution format: tar.gz Programming language: C++ Computer: Single and multiple core servers Operating system: Linux/Unix/Windows Has the code been vectorized or parallelized?: Yes. The code has been parallelized by the use of MPI directives. RAM: 1 Gb Classification: 16.11, 16.12 Nature of problem: Simulating coarse-grain models capable of chemistry e.g. DNA hybridization dynamics. Solution method: Extending LAMMPS to handle dynamic bonding and directional bonds. Unusual features: Allows bonds to be created and broken while angular and dihedral interactions are kept consistent. Additional comments: The distribution file for this program is approximately 36 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead an html file giving details of how the program can be obtained is sent. Running time: Hours to days. The examples provided in the distribution take just seconds to run.
Towards the integration of medical informatics education for clinicians into the medical curriculum.

PubMed

Lungeanu, Diana; Tractenberg, Rochelle E; Bersan, Otilia S; Mihalas, George I

2009-01-01

In the context of an existing first year, one-semester mandatory course of medical informatics (MI) for medical students, we tested an interactive teaching approach in parallel with the traditional academic program. After six semesters (at the beginning of the clinical stage) we collected feedback from the former students in the two parallel programs (with anonymous questionnaires comprising both subjectively-rated items and open-ended questions). We conclude that an introductory course on information and communication technology and information skills can be useful at the beginning of the medical curriculum, while an interactive, problem-based-learning-type MI course should be included during the clinical stage. Early development of these skills, and their use/utility across the curriculum, are important aspects of integrating MI education into clinical training.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools

NASA Technical Reports Server (NTRS)

Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Parallel line analysis: multifunctional software for the biomedical sciences

NASA Technical Reports Server (NTRS)

Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.

1990-01-01

An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.
Parallel aeroelastic computations for wing and wing-body configurations

NASA Technical Reports Server (NTRS)

Byun, Chansup

1994-01-01

The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Evolution and coevolution of developmental programs

NASA Astrophysics Data System (ADS)

Jacob, Christian

1999-09-01

The developmental processes of single organisms, such as growth and structure formation, can be described by parallel rewrite systems in the form of Lindenmayer systems, which also allow one to generate geometrical structures in 3D space using turtle interpretation. We present examples of L-systems for growth programs of plant-like structures. Evolution-based programming techniques are applied to design L-systems by Genetic L-system Programming (GLP), demonstrating how developmental programs for plants, exhibiting specific morphogenetic properties can be interactively bred or automatically evolved. Finally, we demonstrate coevolutionary effects among plant populations consisting of different species, interacting with each other, competing for resources like sunlight and nutrients, and evolving successful reproduction strategies in their specific environments.
Performance of a parallel code for the Euler equations on hypercube computers

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

1990-01-01

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
Elementary School Teachers as "Targets and Agents of Change": Teachers' Learning in Interaction with Reform Science Curriculum

ERIC Educational Resources Information Center

Metz, Kathleen E.

2009-01-01

This article examines teachers' perspectives on the challenges of using a science reform curriculum, as well as their learning in interaction with the curriculum and parallel professional development program. As case studies, I selected 4 veteran teachers of 2nd or 3rd grade, with varying science backgrounds (including 2 with essentially none).…
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies

PubMed Central

Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang

2008-01-01

Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

NASA Astrophysics Data System (ADS)

Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

2012-12-01

The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Performance Metrics for Monitoring Parallel Program Executions

NASA Technical Reports Server (NTRS)

Sarukkai, Sekkar R.; Gotwais, Jacob K.; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Existing tools for debugging performance of parallel programs either provide graphical representations of program execution or profiles of program executions. However, for performance debugging tools to be useful, such information has to be augmented with information that highlights the cause of poor program performance. Identifying the cause of poor performance necessitates the need for not only determining the significance of various performance problems on the execution time of the program, but also needs to consider the effect of interprocessor communications of individual source level data structures. In this paper, we present a suite of normalized indices which provide a convenient mechanism for focusing on a region of code with poor performance and highlights the cause of the problem in terms of processors, procedures and data structure interactions. All the indices are generated from trace files augmented with data structure information.. Further, we show with the help of examples from the NAS benchmark suite that the indices help in detecting potential cause of poor performance, based on augmented execution traces obtained by monitoring the program.

Understanding and Improving High-Performance I/O Subsystems

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Frieder, Gideon; Clark, A. James

1996-01-01

This research program has been conducted in the framework of the NASA Earth and Space Science (ESS) evaluations led by Dr. Thomas Sterling. In addition to the many important research findings for NASA and the prestigious publications, the program has helped orienting the doctoral research program of two students towards parallel input/output in high-performance computing. Further, the experimental results in the case of the MasPar were very useful and helpful to MasPar with which the P.I. has had many interactions with the technical management. The contributions of this program are drawn from three experimental studies conducted on different high-performance computing testbeds/platforms, and therefore presented in 3 different segments as follows: 1. Evaluating the parallel input/output subsystem of a NASA high-performance computing testbeds, namely the MasPar MP- 1 and MP-2; 2. Characterizing the physical input/output request patterns for NASA ESS applications, which used the Beowulf platform; and 3. Dynamic scheduling techniques for hiding I/O latency in parallel applications such as sparse matrix computations. This study also has been conducted on the Intel Paragon and has also provided an experimental evaluation for the Parallel File System (PFS) and parallel input/output on the Paragon. This report is organized as follows. The summary of findings discusses the results of each of the aforementioned 3 studies. Three appendices, each containing a key scholarly research paper that details the work in one of the studies are included.
Critical interactions between the Global Fund-supported HIV programs and the health system in Ghana.

PubMed

Atun, Rifat; Pothapregada, Sai Kumar; Kwansah, Janet; Degbotse, D; Lazarus, Jeffrey V

2011-08-01

The support of global health initiatives in recipient countries has been vigorously debated. Critics are concerned that disease-specific programs may be creating vertical and parallel service delivery structures that to some extent undermine health systems. This case study of Ghana aimed to explore how the Global Fund-supported HIV program interacts with the health system there and to map the extent and nature of integration of the national disease program across 6 key health systems functions. Qualitative interviews of national stakeholders were conducted to understand the perceptions of the strengths and weaknesses of the relationship between Global Fund-supported activities and the health system and to identify positive synergies and unintended consequences of integration. Ghana has a well-functioning sector-wide approach to financing its health system, with a strong emphasis on integrated care delivery. Ghana has benefited from US $175 million of approved Global Fund support to address the HIV epidemic, accounting for almost 85% of the National AIDS Control Program budget. Investments in infrastructure, human resources, and commodities have enabled HIV interventions to increase exponentially. Global Fund-supported activities have been well integrated into key health system functions to strengthen them, especially financing, planning, service delivery, and demand generation. Yet, with governance and monitoring and evaluation functions, parallel structures to national systems have emerged, leading to inefficiencies. This case study demonstrates that interactions and integration are highly varied across different health system functions, and strong government leadership has facilitated the integration of Global Fund-supported activities within national programs.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Image Understanding and Intelligent Parallel Systems

DTIC Science & Technology

1991-05-09

a common user interface for the interactive , graphical manipulation of those histories, and...Circuits and Systems, August 1987. Yap, S.-K. and M.L. Scott, "PenGuin: A language for reactive graphical user interface programming," to appear, INTERACT 󈨞, Cambridge, United Kingdom, 1990. 74 ...of up to a factor of 100 over single-workstation implementations. User interfaces to large multiprocessor computers are a difficult issue addressed
The science of computing - The evolution of parallel processing

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.
Parallel Worlds: Agile and Waterfall Differences and Similarities

DTIC Science & Technology

2013-10-01

development model , and it is deliberately shorter than the Agile Overview as most readers are assumed to be from the Traditional World. For a more in...process of DODI 5000 does not forbid the iterative incremental software development model with frequent end-user interaction, it requires heroics on...added). Today, many of the DOD’s large IT programs therefore continue to adopt program structures and software development models closely
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
JETSPIN: A specific-purpose open-source software for simulations of nanofiber electrospinning

NASA Astrophysics Data System (ADS)

Lauricella, Marco; Pontrelli, Giuseppe; Coluzza, Ivan; Pisignano, Dario; Succi, Sauro

2015-12-01

We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.
Side-band mutual interactions in the magnetosphere

NASA Technical Reports Server (NTRS)

Chang, D. C. D.; Helliwell, R. A.; Bell, T. F.

1980-01-01

Sideband mutual interactions between VLF waves in the magnetosphere are investigated. Results of an experimental program involving the generation of sidebands by means of frequency shift keying are presented which indicate that the energetic electrons in the magnetosphere can interact only with sidebands generated by signals with short modulation periods. Using the value of the memory time during which electrons interact with the waves implied by the above result, it is estimated that the length of the electron interaction region in the magnetosphere is between 4000 and 2000 km. Sideband interactions are found to be similar to those between constant-frequency signals, exhibiting suppression and energy coupling. Results from a second sideband transmitting program show that for most cases the coherence bandwidth of sidebands is about 50 Hz. Sideband mutual interactions are then explained by the overlap of the ranges of the parallel velocity of the electrons which the sidebands organize, and the wave intensity in the interaction region is estimated to be 2.5-10 milli-gamma, in agreement with satellite measurements.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
cellGPU: Massively parallel simulations of dynamic vertex models

NASA Astrophysics Data System (ADS)

Sussman, Daniel M.

2017-10-01

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
Parallelization of MRCI based on hole-particle symmetry.

PubMed

Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin

2005-01-15

The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.
Computer Science Techniques Applied to Parallel Atomistic Simulation

NASA Astrophysics Data System (ADS)

Nakano, Aiichiro

1998-03-01

Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Pteros 2.0: Evolution of the fast parallel molecular analysis library for C++ and python.

PubMed

Yesylevskyy, Semen O

2015-07-15

Pteros is the high-performance open-source library for molecular modeling and analysis of molecular dynamics trajectories. Starting from version 2.0 Pteros is available for C++ and Python programming languages with very similar interfaces. This makes it suitable for writing complex reusable programs in C++ and simple interactive scripts in Python alike. New version improves the facilities for asynchronous trajectory reading and parallel execution of analysis tasks by introducing analysis plugins which could be written in either C++ or Python in completely uniform way. The high level of abstraction provided by analysis plugins greatly simplifies prototyping and implementation of complex analysis algorithms. Pteros is available for free under Artistic License from http://sourceforge.net/projects/pteros/. © 2015 Wiley Periodicals, Inc.
NASA management of the Space Shuttle Program

NASA Technical Reports Server (NTRS)

Peters, F.

1975-01-01

The management system and management technology described have been developed to meet stringent cost and schedule constraints of the Space Shuttle Program. Management of resources available to this program requires control and motivation of a large number of efficient creative personnel trained in various technical specialties. This must be done while keeping track of numerous parallel, yet interdependent activities involving different functions, organizations, and products all moving together in accordance with intricate plans for budgets, schedules, performance, and interaction. Some techniques developed to identify problems at an early stage and seek immediate solutions are examined.
An Exploratory Analysis of Student-Community Interactions in Urban Agriculture

ERIC Educational Resources Information Center

Grossman, Julie; Sherard, Maximilian; Prohn, Seb M.; Bradley, Lucy; Goodell, L. Suzanne; Andrew, Katherine

2012-01-01

Urban agriculture initiatives are on the rise, providing healthy food while teaching a land ethic to youth. In parallel, increasing numbers of university graduates are obtaining Extension work requiring the effective communication of science in a diverse, urban, low-income setting. This study evaluates a pilot service-learning program, the…
Semantic Language Extensions for Implicit Parallel Programming

DTIC Science & Technology

2013-09-01

mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend
Parallel program debugging with flowback analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choi, Jongdeok.

1989-01-01

This thesis describes the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors. The goal of the debugging system is to present to the programmer a graphical view of the dynamic program dependences while keeping the execution-time overhead low. The author first describes the use of flowback analysis to provide information on causal relationship between events in a programs' execution without re-executing the program for debugging. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. He uses semantic analysis and a technique called incrementalmore » tracing to keep the time and space overhead low. As part of the semantic analysis, he uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic information produced during execution time. The cornerstone of the incremental tracing concept is to generate a coarse trace during execution and fill incrementally, during the interactive portion of the debugging session, the gap between the information gathered in the coarse trace and the information needed to do the flowback analysis using the coarse trace. Then, he describes how to extend the flowback analysis to parallel programs. The flowback analysis can span process boundaries; i.e., the most recent modification to a shared variable might be traced to a different process than the one that contains the current reference. The static and dynamic program dependence graphs of the individual processes are tied together with synchronization and data dependence information to form complete graphs that represent the entire program.« less
Multiscale Simulations of Magnetic Island Coalescence

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2010-01-01

We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1997-01-01

While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.

Parallel software support for computational structural mechanics

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1987-01-01

The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
DOVIS 2.0: An Efficient and Easy to Use Parallel Virtual Screening Tool Based on AutoDock 4.0

DTIC Science & Technology

2008-09-08

under the GNU General Public License. Background Molecular docking is a computational method that pre- dicts how a ligand interacts with a receptor...Hence, it is an important tool in studying receptor-ligand interactions and plays an essential role in drug design. Particularly, molecular docking has...libraries from OpenBabel and setup a molecular data structure as a C++ object in our program. This makes handling of molecular structures (e.g., atoms
Proceedings of the workshop on Compilation of (Symbolic) Languages for Parallel Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Tick, E.

1991-11-01

This report comprises the abstracts and papers for the talks presented at the Workshop on Compilation of (Symbolic) Languages for Parallel Computers, held October 31--November 1, 1991, in San Diego. These unreferred contributions were provided by the participants for the purpose of this workshop; many of them will be published elsewhere in peer-reviewed conferences and publications. Our goal is planning this workshop was to bring together researchers from different disciplines with common problems in compilation. In particular, we wished to encourage interaction between researchers working in compilation of symbolic languages and those working on compilation of conventional, imperative languages. Themore » fundamental problems facing researchers interested in compilation of logic, functional, and procedural programming languages for parallel computers are essentially the same. However, differences in the basic programming paradigms have led to different communities emphasizing different species of the parallel compilation problem. For example, parallel logic and functional languages provide dataflow-like formalisms in which control dependencies are unimportant. Hence, a major focus of research in compilation has been on techniques that try to infer when sequential control flow can safely be imposed. Granularity analysis for scheduling is a related problem. The single- assignment property leads to a need for analysis of memory use in order to detect opportunities for reuse. Much of the work in each of these areas relies on the use of abstract interpretation techniques.« less
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
The instant sequencing task: Toward constraint-checking a complex spacecraft command sequence interactively

NASA Technical Reports Server (NTRS)

Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.

1993-01-01

Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.

1995-01-01

This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1997-01-01

Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
COMPARE/Radiology, an interactive Web-based radiology teaching program evaluation of user response.

PubMed

Wagner, Matthias; Heckemann, Rolf A; Nömayr, Anton; Greess, Holger; Bautz, Werner A; Grunewald, Markus

2005-06-01

The aim of this study is to assess user benefits of COMPARE/Radiology, a highly interactive World Wide Web-based training program for radiology, as perceived by its users. COMPARE/Radiology (http://www.idr.med.uni-erlangen.de/compare.htm), an interactive training program based on 244 teaching cases, was created by the authors and made publicly available on the Internet. An anonymous survey was conducted among users to investigate the composition of the program's user base and assess the acceptance of the training program. In parallel, Web access data were collected and analyzed using descriptive statistics. The group of responding users (n = 1370) consisted of 201 preclinical medical students (14.7%), 314 clinical medical students (22.9%), 359 residents in radiology (26.2%), and 205 users of other professions (14.9%). A majority of respondents (1230; 89%) rated the interactivity of COMPARE/Radiology as good or excellent. Many respondents use COMPARE/Radiology for self-study (971; 70%) and for teaching others (600; 43%). Web access statistics show an increase in number of site visits from 1248 in December 2002 to 4651 in April 2004. Users appreciate the benefits of COMPARE/Radiology. The interactive instructional design was rated positively by responding users. The popularity of the site is growing, evidenced by the number of network accesses during the observation period.
Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

NASA Astrophysics Data System (ADS)

Wittek, Peter; Calderaro, Luca

2015-12-01

We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.
OSIRIS - an object-oriented parallel 3D PIC code for modeling laser and particle beam-plasma interaction

NASA Astrophysics Data System (ADS)

Hemker, Roy

1999-11-01

The advances in computational speed make it now possible to do full 3D PIC simulations of laser plasma and beam plasma interactions, but at the same time the increased complexity of these problems makes it necessary to apply modern approaches like object oriented programming to the development of simulation codes. We report here on our progress in developing an object oriented parallel 3D PIC code using Fortran 90. In its current state the code contains algorithms for 1D, 2D, and 3D simulations in cartesian coordinates and for 2D cylindrically-symmetric geometry. For all of these algorithms the code allows for a moving simulation window and arbitrary domain decomposition for any number of dimensions. Recent 3D simulation results on the propagation of intense laser and electron beams through plasmas will be presented.
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package

NASA Astrophysics Data System (ADS)

Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin

2015-01-01

A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
Equalizer: a scalable parallel rendering framework.

PubMed

Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

2009-01-01

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

Aeroelasticity of wing and wing-body configurations on parallel computers

NASA Technical Reports Server (NTRS)

Byun, Chansup

1995-01-01

The objective of this research is to develop computationally efficient methods for solving aeroelasticity problems on parallel computers. Both uncoupled and coupled methods are studied in this research. For the uncoupled approach, the conventional U-g method is used to determine the flutter boundary. The generalized aerodynamic forces required are obtained by the pulse transfer-function analysis method. For the coupled approach, the fluid-structure interaction is obtained by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

1994-01-01

This research program deals with the application of high-performance computing methods for the analysis of complete jet engines. We have entitled this program by applying the two dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition, and solution capabilities were successfully tested. We then focused attention on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion that results from these structural displacements. This is treated by a new arbitrary Lagrangian-Eulerian (ALE) technique that models the fluid mesh motion as that of a fictitious mass-spring network. New partitioned analysis procedures to treat this coupled three-component problem are developed. These procedures involved delayed corrections and subcycling. Preliminary results on the stability, accuracy, and MPP computational efficiency are reported.
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
Composite structural materials

NASA Technical Reports Server (NTRS)

Ansell, G. S.; Loewy, R. G.; Wiberley, S. E.

1981-01-01

The composite aircraft program component (CAPCOMP) is a graduate level project conducted in parallel with a composite structures program. The composite aircraft program glider (CAPGLIDE) is an undergraduate demonstration project which has as its objectives the design, fabrication, and testing of a foot launched ultralight glider using composite structures. The objective of the computer aided design (COMPAD) portion of the composites project is to provide computer tools for the analysis and design of composite structures. The major thrust of COMPAD is in the finite element area with effort directed at implementing finite element analysis capabilities and developing interactive graphics preprocessing and postprocessing capabilities. The criteria for selecting research projects to be conducted under the innovative and supporting research (INSURE) program are described.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Electronic and Solid State Sciences Program Summary, FY 1979.

DTIC Science & Technology

1979-01-01

studies of the interaction of the electromagnetic field with heat conducting and electrically non-conducting and conducting polarizable and mag- netizable...Physical Review Letters, 42, 401-404 (1979). 9. "The low temperature electronic specific heat of disordered one dimensional chains", by P. S...technique exploits parallel photoheating and dc electrical- heating experiments. The CO laser hot electron studies have provided information on the
Department of Defense High Performance Computing Modernization Program. 2006 Annual Report

DTIC Science & Technology

2007-03-01

Department. We successfully completed several software development projects that introduced parallel, scalable production software now in use across the...imagined. They are developing and deploying weather and ocean models that allow our soldiers, sailors, marines and airmen to plan missions more effectively...and to navigate adverse environments safely. They are modeling molecular interactions leading to the development of higher energy fuels, munitions
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.

PubMed

Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

2015-07-01

Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Comparison of Parallel and Integrated Models for Implementation of Routine HIV Screening in a Large, Urban Emergency Department.

PubMed

Hankin, Abigail; Freiman, Heather; Copeland, Brittney; Travis, Natasha; Shah, Bijal

2016-01-01

This study compared two approaches for implementation of non-targeted HIV screening in the emergency department (ED): (1) designated HIV counselors screening in parallel with ED care and (2) nurse-based screening integrated into patient triage. A retrospective analysis was performed to compare parallel and integrated screening models using data from the first 12 months of each program. Data for the parallel screening model were extracted from information collected by HIV test counselors and the electronic medical record (EMR). Integrated screening model data were extracted from the EMR and supplemented by data collected by HIV social workers during patient interaction. For both programs, data included demographics, HIV test offer, test acceptance or declination, and test result. A Z-test between two proportions was performed to compare screening frequencies and results. During the first 12 months of parallel screening, approximately 120,000 visits were made to the ED, with 3,816 (3%) HIV tests administered and 65 (2%) new diagnoses of HIV infection. During the first 12 months of integrated screening, 111,738 patients were triaged in the ED, with 16,329 (15%) patients tested and 190 (1%) new diagnoses. Integrated screening resulted in an increased frequency of HIV screening compared with parallel screening (0.15 tests per ED patient visit vs. 0.03 tests per ED patient visit, p<0.001) and an increase in the absolute number of new diagnoses (190 vs. 65), representing a slight decrease in the proportion of new diagnoses (1% vs. 2%, p=0.007). Non-targeted, integrated HIV screening, with test offer and order by ED nurses during patient triage, is feasible and resulted in an increased frequency of HIV screening and a threefold increase in the absolute number of newly identified HIV-positive patients.
On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

PubMed Central

Calcagno, Cristina; Coppo, Mario

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327
On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

PubMed

Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.
A new parallel algorithm of MP2 energy calculations.

PubMed

Ishimura, Kazuya; Pulay, Peter; Nagase, Shigeru

2006-03-01

A new parallel algorithm has been developed for second-order Møller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2-16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C(47)H(51)NO(14)) and luciferin (C(11)H(8)N(2)O(3)S(2)), the former with the 6-31G* and 6-311G** basis sets (1,032 and 1,484 basis functions, 164 correlated orbitals), and the latter with the aug-cc-pVDZ and aug-cc-pVTZ basis sets (530 and 1,198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C(130)H(10) (1,970 basis functions, 265 correlated orbitals) completed in less than 2 h on 128 processors.
Reducing neural network training time with parallel processing

NASA Technical Reports Server (NTRS)

Rogers, James L., Jr.; Lamarsh, William J., II

1995-01-01

Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
Searching for globally optimal functional forms for interatomic potentials using genetic programming with parallel tempering.

PubMed

Slepoy, A; Peters, M D; Thompson, A P

2007-11-30

Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
The Electric Propulsion Interactions Code (EPIC): A Member of the NASA Space Environment and Effects Program (SEE) Toolset

NASA Technical Reports Server (NTRS)

Mikellides, Ioannis G.; Mandell, Myron J.; Kuharski, Robert A.; Davis, D. A.; Gardner, Barbara M.; Minor, Jody

2003-01-01

Science Applications International Corporation is currently developing the Electric Propulsion Interactions Code, EPIC, as part of a project sponsored by the Space Environments and Effects Program at NASA Marshall Space Flight Center. Now in its second year of development, EPIC is an interactive computer toolset that allows the construction of a 3-D spacecraft model, and the assessment of a variety of interactions between its subsystems and the plume from an electric thruster. This paper reports on the progress of EPZC including the recently added ability to exchange results the NASA Charging Analyzer Program, Nascap-2k. The capability greatly enhances EPIC's range of applicability. Expansion of the toolset's various physics models proceeds in parallel with the overall development of the software. Also presented are recent upgrades of the elastic scattering algorithm in the electric propulsion Plume Tool. These upgrades are motivated by the need to assess the effects of elastically scattered ions on the SIC for ion beam energies that exceed loo0 eV. Such energy levels are expected in future high-power (>10 kW) ion propulsion systems empowered by nuclear sources.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Use Computer-Aided Tools to Parallelize Large CFD Applications

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Yan, J.

2000-01-01

Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
Nature of the water/aromatic parallel alignment interactions.

PubMed

Mitoraj, Mariusz P; Janjić, Goran V; Medaković, Vesna B; Veljković, Dušan Ž; Michalak, Artur; Zarić, Snežana D; Milčić, Miloš K

2015-01-30

The water/aromatic parallel alignment interactions are interactions where the water molecule or one of its O-H bonds is parallel to the aromatic ring plane. The calculated energies of the interactions are significant, up to ΔE(CCSD)(T)(limit) = -2.45 kcal mol(-1) at large horizontal displacement, out of benzene ring and CH bond region. These interactions are stronger than CH···O water/benzene interactions, but weaker than OH···π interactions. To investigate the nature of water/aromatic parallel alignment interactions, energy decomposition methods, symmetry-adapted perturbation theory, and extended transition state-natural orbitals for chemical valence (NOCV), were used. The calculations have shown that, for the complexes at large horizontal displacements, major contribution to interaction energy comes from electrostatic interactions between monomers, and for the complexes at small horizontal displacements, dispersion interactions are dominant binding force. The NOCV-based analysis has shown that in structures with strong interaction energies charge transfer of the type π → σ*(O-H) between the monomers also exists. © 2014 Wiley Periodicals, Inc.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems

NASA Astrophysics Data System (ADS)

Endo, Eishin; Toga, Yuta; Sasaki, Munetaka

2015-07-01

We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

NASA Astrophysics Data System (ADS)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
QA4, a language for artificial intelligence.

NASA Technical Reports Server (NTRS)

Derksen, J. A. C.

1973-01-01

Introduction of a language for problem solving and specifically robot planning, program verification, and synthesis and theorem proving. This language, called question-answerer 4 (QA4), embodies many features that have been found useful for constructing problem solvers but have to be programmed explicitly by the user of a conventional language. The most important features of QA4 are described, and examples are provided for most of the material introduced. Language features include backtracking, parallel processing, pattern matching, set manipulation, and pattern-triggered function activation. The language is most convenient for use in an interactive way and has extensive trace and edit facilities.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Impact of an interactive anti-speeding threat appeal: how much threat is too much?

PubMed

Panić, Katarina; Cauberghe, Verolien; De Pelsmacker, Patrick

2011-05-01

This study investigates the impact of an interactive television public-service announcement (PSA) containing an anti-speeding threat appeal on feelings of telepresence and behavioral intention. In a 2 × 2 × 2 between-subjects factorial design with 213 participants, the level of threat evoked by a traditional PSA, by the interactive part of the PSA (dedicated advertising location or DAL) and by the preceding program context are manipulated to be either low or high. The results support the assumptions of the Extended Parallel Processing Model with regard to the effect of the level of perceived threat and perceived efficacy in an interactive media environment, and the important role of telepresence as a processing variable. The results of the three-way interaction effect of threat evoked by the program, the PSA and the DAL on telepresence show that when the threat levels of the program and the PSA are both either low or high, exposure to the threatening information in the DAL does not generate a significantly higher feeling of telepresence. However, when a low-threat program is followed by a high-threat PSA, the threat level of the DAL has a positive effect on telepresence. The same trend is found with a high-threat program and a low-threat PSA, although the effect of the threat evoked by the DAL on telepresence is not significant at conventional levels. Finally, there is a positive effect of telepresence on the behavioral intention to reduce speeding, which is partly mediated by the viewer's perceived efficacy to follow the recommended behavior.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.

76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Recent highlights from STAR

NASA Astrophysics Data System (ADS)

Zha, Wangmei

2018-02-01

The Solenoidal Tracker at RHIC (STAR) experiment takes advantage of its excellent tracking and particle identification capabilities at mid-rapidity to explore the properties of strongly interacting QCD matter created in heavy-ion collisions at RHIC. The STAR collaboration presented 7 parallel and 2 plenary talks at Strangeness in Quark Matter 2017 and covered various topics including heavy flavor measurements, bulk observables, electro-magnetic probes and the upgrade program. This paper highlights some of the selected results.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
[Evaluation of our psycho-educative program by participating caregivers].

PubMed

Bier, J C; Van den Berge, D; de Wouters d'Oplinter, N; Bosman, N; Fery, P

2010-09-01

Facing difficulties due to dementia syndromes, systemic care is necessary. Amongst therapies assessed specifically to caregivers, psychoeducative steps seem to be the strongest effective one on neuropsychiatrics symptoms. Psychoeducations tend to teach the caregivers to modify their interactions with patients via a better understanding of illnesses and patients. Our training "Pour mieux vivre avec la maladie d'Alzheimer", applied in groups of eight to twelve persons, consists in twelve sessions of two hours each. To assure the biggest possible availability, we recently incorporated the concomitant coverage of patients into artistic workshops. These sessions of art-therapy realized in parallel to our psychoeducative program will thus be estimated according to the same rigorous methodology. The critical evaluations realized by participants at the end of our program reflect the outcome of our main objective (to teach to modify interactions with the patients) while contributing to the improvement of social contacts and to the learning of calling to existing helps. These preliminary results strongly argue for the pursuit and even extension of this kind of caregiver's management.
PC-CUBE: A Personal Computer Based Hypercube

NASA Technical Reports Server (NTRS)

Ho, Alex; Fox, Geoffrey; Walker, David; Snyder, Scott; Chang, Douglas; Chen, Stanley; Breaden, Matt; Cole, Terry

1988-01-01

PC-CUBE is an ensemble of IBM PCs or close compatibles connected in the hypercube topology with ordinary computer cables. Communication occurs at the rate of 115.2 K-band via the RS-232 serial links. Available for PC-CUBE is the Crystalline Operating System III (CrOS III), Mercury Operating System, CUBIX and PLOTIX which are parallel I/O and graphics libraries. A CrOS performance monitor was developed to facilitate the measurement of communication and computation time of a program and their effects on performance. Also available are CXLISP, a parallel version of the XLISP interpreter; GRAFIX, some graphics routines for the EGA and CGA; and a general execution profiler for determining execution time spent by program subroutines. PC-CUBE provides a programming environment similar to all hypercube systems running CrOS III, Mercury and CUBIX. In addition, every node (personal computer) has its own graphics display monitor and storage devices. These allow data to be displayed or stored at every processor, which has much instructional value and enables easier debugging of applications. Some application programs which are taken from the book Solving Problems on Concurrent Processors (Fox 88) were implemented with graphics enhancement on PC-CUBE. The applications range from solving the Mandelbrot set, Laplace equation, wave equation, long range force interaction, to WaTor, an ecological simulation.
Multilevel Parallelization of AutoDock 4.2.

PubMed

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Programs as Polypeptides.

PubMed

Williams, Lance R

2016-01-01

Object-oriented combinator chemistry (OOCC) is an artificial chemistry with composition devices borrowed from object-oriented and functional programming languages. Actors in OOCC are embedded in space and subject to diffusion; since they are neither created nor destroyed, their mass is conserved. Actors use programs constructed from combinators to asynchronously update their own states and the states of other actors in their neighborhoods. The fact that programs and combinators are themselves reified as actors makes it possible to build programs that build programs from combinators of a few primitive types using asynchronous spatial processes that resemble chemistry as much as computation. To demonstrate this, OOCC is used to define a parallel, asynchronous, spatially distributed self-replicating system modeled in part on the living cell. Since interactions among its parts result in the construction of more of these same parts, the system is strongly constructive. The system's high normalized complexity is contrasted with that of a simple composome.
Accelerating sino-atrium computer simulations with graphic processing units.

PubMed

Zhang, Hong; Xiao, Zheng; Lin, Shien-fong

2015-01-01

Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

PubMed

Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik

2008-03-01

Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.

A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

NASA Technical Reports Server (NTRS)

Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
9th Annual Symposium on Self-Monitoring of Blood Glucose, April 28-30, 2016, Madrid, Spain.

PubMed

Parkin, Christopher G; Homberg, Anita; Hinzmann, Rolf

2016-11-01

International experts in the field of diabetes and diabetes technology met in Madrid, Spain, for the 9th Annual Symposium on Self-Monitoring of Blood Glucose. The goal of these meetings is to establish a global network of experts, thus facilitating new collaborations and research projects to improve the lives of people with diabetes. The 2016 meeting comprised a comprehensive scientific program, parallel interactive workshops, and two keynote lectures.
The design and implementation of CRT displays in the TCV real-time simulation

NASA Technical Reports Server (NTRS)

Leavitt, J. B.; Tariq, S. I.; Steinmetz, G. G.

1975-01-01

The design and application of computer graphics to the Terminal Configured Vehicle (TCV) program were described. A Boeing 737-100 series aircraft was modified with a second flight deck and several computers installed in the passenger cabin. One of the elements in support of the TCV program is a sophisticated simulation system developed to duplicate the operation of the aft flight deck. This facility consists of an aft flight deck simulator, equipped with realistic flight instrumentation, a CDC 6600 computer, and an Adage graphics terminal; this terminal presents to the simulator pilot displays similar to those used on the aircraft with equivalent man-machine interactions. These two displays form the primary flight instrumentation for the pilot and are dynamic images depicting critical flight information. The graphics terminal is a high speed interactive refresh-type graphics system. To support the cockpit display, two remote CRT's were wired in parallel with two of the Adage scopes.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Implementation of Multivariable Logic Functions in Parallel by Electrically Addressing a Molecule of Three Dopants in Silicon.

PubMed

Fresch, Barbara; Bocquel, Juanita; Hiluf, Dawit; Rogge, Sven; Levine, Raphael D; Remacle, Françoise

2017-07-05

To realize low-power, compact logic circuits, one can explore parallel operation on single nanoscale devices. An added incentive is to use multivalued (as distinct from Boolean) logic. Here, we theoretically demonstrate that the computation of all the possible outputs of a multivariate, multivalued logic function can be implemented in parallel by electrical addressing of a molecule made up of three interacting dopant atoms embedded in Si. The electronic states of the dopant molecule are addressed by pulsing a gate voltage. By simulating the time evolution of the non stationary electronic density built by the gate voltage, we show that one can implement a molecular decision tree that provides in parallel all the outputs for all the inputs of the multivariate, multivalued logic function. The outputs are encoded in the populations and in the bond orders of the dopant molecule, which can be measured using an STM tip. We show that the implementation of the molecular logic tree is equivalent to a spectral function decomposition. The function that is evaluated can be field-programmed by changing the time profile of the pulsed gate voltage. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

PubMed

Stone, John E; Gohara, David; Shi, Guochun

2010-05-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Development of a Dynamic Time Sharing Scheduled Environment Final Report CRADA No. TC-824-94E

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jette, M.; Caliga, D.

Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CFU implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processorsmore » are allocated simultaneously to aU components of a parallel program (in a “gang”). Program execution is preempted as needed to provide for interactivity. Programs are also reIocated to different processors as needed to efficiently pack the computer’s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for systemlevel software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing, efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures.« less
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

PubMed Central

Stone, John E.; Gohara, David; Shi, Guochun

2010-01-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
An Expert System for the Development of Efficient Parallel Code

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Interactions between flames on parallel solid surfaces

NASA Technical Reports Server (NTRS)

Urban, David L.

1995-01-01

The interactions between flames spreading over parallel solid sheets of paper are being studied in normal gravity and in microgravity. This geometry is of practical importance since in most heterogeneous combustion systems, the condensed phase is non-continuous and spatially distributed. This spatial distribution can strongly affect burning and/or spread rate. This is due to radiant and diffusive interactions between the surface and the flames above the surfaces. Tests were conducted over a variety of pressures and separation distances to expose the influence of the parallel sheets on oxidizer transport and on radiative feedback.
Solving Integer Programs from Dependence and Synchronization Problems

DTIC Science & Technology

1993-03-01

DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving

The MOLDY short-range molecular dynamics package

NASA Astrophysics Data System (ADS)

Ackland, G. J.; D'Mellow, K.; Daraszewicz, S. L.; Hepburn, D. J.; Uhrin, M.; Stratford, K.

2011-12-01

We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron-carbon alloy. Program summaryProgram title: MOLDY Catalogue identifier: AEJU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 2 No. of lines in distributed program, including test data, etc.: 382 881 No. of bytes in distributed program, including test data, etc.: 6 705 242 Distribution format: tar.gz Programming language: Fortran 95/OpenMP Computer: Any Operating system: Any Has the code been vectorised or parallelized?: Yes. OpenMP is required for parallel execution RAM: 100 MB or more Classification: 7.7 Nature of problem: Moldy addresses the problem of many atoms (of order 10 6) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces. Solution method: Molecular dynamics involves integrating Newton's equations of motion. MOLDY uses verlet (for good energy conservation) or predictor-corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello-Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals. Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order- N scaling will fail. Different interatomic potential forms require recompilation of the code. Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects. Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop.
SMMP v. 3.0—Simulating proteins and protein interactions in Python and Fortran

NASA Astrophysics Data System (ADS)

Meinke, Jan H.; Mohanty, Sandipan; Eisenmenger, Frank; Hansmann, Ulrich H. E.

2008-03-01

We describe a revised and updated version of the program package SMMP. SMMP is an open-source FORTRAN package for molecular simulation of proteins within the standard geometry model. It is designed as a simple and inexpensive tool for researchers and students to become familiar with protein simulation techniques. SMMP 3.0 sports a revised API increasing its flexibility, an implementation of the Lund force field, multi-molecule simulations, a parallel implementation of the energy function, Python bindings, and more. Program summaryTitle of program:SMMP Catalogue identifier:ADOJ_v3_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADOJ_v3_0.html Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions:Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html Programming language used:FORTRAN, Python No. of lines in distributed program, including test data, etc.:52 105 No. of bytes in distributed program, including test data, etc.:599 150 Distribution format:tar.gz Computer:Platform independent Operating system:OS independent RAM:2 Mbytes Classification:3 Does the new version supersede the previous version?:Yes Nature of problem:Molecular mechanics computations and Monte Carlo simulation of proteins. Solution method:Utilizes ECEPP2/3, FLEX, and Lund potentials. Includes Monte Carlo simulation algorithms for canonical, as well as for generalized ensembles. Reasons for new version:API changes and increased functionality. Summary of revisions:Added Lund potential; parameters used in subroutines are now passed as arguments; multi-molecule simulations; parallelized energy calculation for ECEPP; Python bindings. Restrictions:The consumed CPU time increases with the size of protein molecule. Running time:Depends on the size of the simulated molecule.
Geological mapping in northwestern Saudi Arabia using LANDSAT multispectral techniques

NASA Technical Reports Server (NTRS)

Blodget, H. W.; Brown, G. F.; Moik, J. G.

1975-01-01

Various computer enhancement and data extraction systems using LANDSAT data were assessed and used to complement a continuing geologic mapping program. Interactive digital classification techniques using both the parallel-piped and maximum-likelihood statistical approaches achieve very limited success in areas of highly dissected terrain. Computer enhanced imagery developed by color compositing stretched MSS ratio data was constructed for a test site in northwestern Saudi Arabia. Initial results indicate that several igneous and sedimentary rock types can be discriminated.
7th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG), May 8–10, 2014, Helsinki, Finland

PubMed Central

Mlinac, Anita; Hinzmann, Rolf

2014-01-01

Abstract International experts in the fields of diabetes, diabetes technology, endocrinology, mobile health, sport science, and regulatory issues gathered for the 7th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG). The aim of this meeting was to facilitate new collaborations and research projects to improve the lives of people with diabetes. The 2014 meeting comprised a comprehensive scientific program, parallel interactive workshops, and two keynote lectures. PMID:25211215
10th Annual Symposium on Self-Monitoring of Blood Glucose, April 27–29, 2017, Warsaw, Poland

PubMed Central

Homberg, Anita; Hinzmann, Rolf

2018-01-01

Abstract International experts in the field of diabetes and diabetes technology met in Warsaw, Poland, for the 10th Annual Symposium on Self-Monitoring of Blood Glucose. The goal of these meetings is to establish a global network of experts to facilitate new collaborations and research projects that can improve the lives of people with diabetes. The 2017 meeting comprised a comprehensive scientific program, parallel interactive workshops, and four keynote lectures. PMID:29135283
BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures

NASA Astrophysics Data System (ADS)

Deslippe, Jack; Samsonidze, Georgy; Strubbe, David A.; Jain, Manish; Cohen, Marvin L.; Louie, Steven G.

2012-06-01

BerkeleyGW is a massively parallel computational package for electron excited-state properties that is based on the many-body perturbation theory employing the ab initio GW and GW plus Bethe-Salpeter equation methodology. It can be used in conjunction with many density-functional theory codes for ground-state properties, including PARATEC, PARSEC, Quantum ESPRESSO, SIESTA, and Octopus. The package can be used to compute the electronic and optical properties of a wide variety of material systems from bulk semiconductors and metals to nanostructured materials and molecules. The package scales to 10 000s of CPUs and can be used to study systems containing up to 100s of atoms. Program summaryProgram title: BerkeleyGW Catalogue identifier: AELG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open source BSD License. See code for licensing details. No. of lines in distributed program, including test data, etc.: 576 540 No. of bytes in distributed program, including test data, etc.: 110 608 809 Distribution format: tar.gz Programming language: Fortran 90, C, C++, Python, Perl, BASH Computer: Linux/UNIX workstations or clusters Operating system: Tested on a variety of Linux distributions in parallel and serial as well as AIX and Mac OSX RAM: (50-2000) MB per CPU (Highly dependent on system size) Classification: 7.2, 7.3, 16.2, 18 External routines: BLAS, LAPACK, FFTW, ScaLAPACK (optional), MPI (optional). All available under open-source licenses. Nature of problem: The excited state properties of materials involve the addition or subtraction of electrons as well as the optical excitations of electron-hole pairs. The excited particles interact strongly with other electrons in a material system. This interaction affects the electronic energies, wavefunctions and lifetimes. It is well known that ground-state theories, such as standard methods based on density-functional theory, fail to correctly capture this physics. Solution method: We construct and solve the Dyson's equation for the quasiparticle energies and wavefunctions within the GW approximation for the electron self-energy. We additionally construct and solve the Bethe-Salpeter equation for the correlated electron-hole (exciton) wavefunctions and excitation energies. Restrictions: The material size is limited in practice by the computational resources available. Materials with up to 500 atoms per periodic cell can be studied on large HPCs. Additional comments: The distribution file for this program is approximately 110 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: 1-1000 minutes (depending greatly on system size and processor number).
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Whistlers, Helicons, Lower Hybrid Waves: the Physics of RF Wave Absorption Without Cyclotron Resonances

NASA Astrophysics Data System (ADS)

Pinsker, R. I.

2014-10-01

In hot magnetized plasmas, two types of linear collisionless absorption processes are used to heat and drive noninductive current: absorption at ion or electron cyclotron resonances and their harmonics, and absorption by Landau damping and the transit-time-magnetic-pumping (TTMP) interactions. This tutorial discusses the latter process, i.e., parallel interactions between rf waves and electrons in which cyclotron resonance is not involved. Electron damping by the parallel interactions can be important in the ICRF, particularly in the higher harmonic region where competing ion cyclotron damping is weak, as well as in the Lower Hybrid Range of Frequencies (LHRF), which is in the neighborhood of the geometric mean of the ion and electron cyclotron frequencies. On the other hand, absorption by parallel processes is not significant in conventional ECRF schemes. Parallel interactions are especially important for the realization of high current drive efficiency with rf waves, and an application of particular recent interest is current drive with the whistler or helicon wave at high to very high (i.e., the LHRF) ion cyclotron harmonics. The scaling of absorption by parallel interactions with wave frequency is examined and the advantages and disadvantages of fast (helicons/whistlers) and slow (lower hybrid) waves in the LHRF in the context of reactor-grade tokamak plasmas are compared. In this frequency range, both wave modes can propagate in a significant fraction of the discharge volume; the ways in which the two waves can interact with each other are considered. The use of parallel interactions to heat and drive current in practice will be illustrated with examples from past experiments; also looking forward, this tutorial will provide an overview of potential applications in tokamak reactors. Supported by the US Department of Energy under DE-FC02-04ER54698.
Logic design for dynamic and interactive recovery.

NASA Technical Reports Server (NTRS)

Carter, W. C.; Jessep, D. C.; Wadia, A. B.; Schneider, P. R.; Bouricius, W. G.

1971-01-01

Recovery in a fault-tolerant computer means the continuation of system operation with data integrity after an error occurs. This paper delineates two parallel concepts embodied in the hardware and software functions required for recovery; detection, diagnosis, and reconfiguration for hardware, data integrity, checkpointing, and restart for the software. The hardware relies on the recovery variable set, checking circuits, and diagnostics, and the software relies on the recovery information set, audit, and reconstruct routines, to characterize the system state and assist in recovery when required. Of particular utility is a handware unit, the recovery control unit, which serves as an interface between error detection and software recovery programs in the supervisor and provides dynamic interactive recovery.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Distributed and parallel Ada and the Ada 9X recommendations

NASA Technical Reports Server (NTRS)

Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

1992-01-01

Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
RIACS/USRA

NASA Technical Reports Server (NTRS)

Oliger, Joseph

1993-01-01

The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on 6 June 1983. RIACS is privately operated by USRA, a consortium of universities with research programs in the aerospace sciences, under contract with NASA. The primary mission of RIACS is to provide research and expertise in computer science and scientific computing to support the scientific missions of NASA ARC. The research carried out at RIACS must change its emphasis from year to year in response to NASA ARC's changing needs and technological opportunities. A flexible scientific staff is provided through a university faculty visitor program, a post doctoral program, and a student visitor program. Not only does this provide appropriate expertise but it also introduces scientists outside of NASA to NASA problems. A small group of core RIACS staff provides continuity and interacts with an ARC technical monitor and scientific advisory group to determine the RIACS mission. RIACS activities are reviewed and monitored by a USRA advisory council and ARC technical monitor. Research at RIACS is currently being done in the following areas: Parallel Computing, Advanced Methods for Scientific Computing, High Performance Networks and Technology, and Learning Systems. Parallel compiler techniques, adaptive numerical methods for flows in complicated geometries, and optimization were identified as important problems to investigate for ARC's involvement in the Computational Grand Challenges of the next decade.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Biocellion: accelerating computer simulation of multicellular biological system models

PubMed Central

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-01-01

Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
[Role of Hormones in Perinatal and Early Postnatal Development: Possible Contribution to Programming/Imprinting Phenomena].

PubMed

Goudochnikov, V I

2015-01-01

In parallel to formulating the paradigm of developmental origins of health and disease (DOHaD), the search began on mechanisms of programming/imprinting in ontogeny. Some recent evidence has revealed the important role of glucocorticoids in such mechanisms. However, in the last decades numerous data have been accumulated on participation of other hormones in developmental bioregulation. In present article we analyse these data, as referred to melatonin, but also to neuroactive steroids, somatolactogens and related peptides: insulin-like growth factor of type I (IGF-I) and oxytocin, i.e. peptide regulators related to growth and lactation respectively. Special attention was devoted to the evidence of glucocorticoid interactions with some of these hormones.

Pteros: fast and easy to use open-source C++ library for molecular analysis.

PubMed

Yesylevskyy, Semen O

2012-07-15

An open-source Pteros library for molecular modeling and analysis of molecular dynamics trajectories for C++ programming language is introduced. Pteros provides a number of routine analysis operations ranging from reading and writing trajectory files and geometry transformations to structural alignment and computation of nonbonded interaction energies. The library features asynchronous trajectory reading and parallel execution of several analysis routines, which greatly simplifies development of computationally intensive trajectory analysis algorithms. Pteros programming interface is very simple and intuitive while the source code is well documented and easily extendible. Pteros is available for free under open-source Artistic License from http://sourceforge.net/projects/pteros/. Copyright © 2012 Wiley Periodicals, Inc.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Physics Computing '92: Proceedings of the 4th International Conference

NASA Astrophysics Data System (ADS)

de Groot, Robert A.; Nadrchal, Jaroslav

1993-04-01

The Table of Contents for the book is as follows: * Preface * INVITED PAPERS * Ab Initio Theoretical Approaches to the Structural, Electronic and Vibrational Properties of Small Clusters and Fullerenes: The State of the Art * Neural Multigrid Methods for Gauge Theories and Other Disordered Systems * Multicanonical Monte Carlo Simulations * On the Use of the Symbolic Language Maple in Physics and Chemistry: Several Examples * Nonequilibrium Phase Transitions in Catalysis and Population Models * Computer Algebra, Symmetry Analysis and Integrability of Nonlinear Evolution Equations * The Path-Integral Quantum Simulation of Hydrogen in Metals * Digital Optical Computing: A New Approach of Systolic Arrays Based on Coherence Modulation of Light and Integrated Optics Technology * Molecular Dynamics Simulations of Granular Materials * Numerical Implementation of a K.A.M. Algorithm * Quasi-Monte Carlo, Quasi-Random Numbers and Quasi-Error Estimates * What Can We Learn from QMC Simulations * Physics of Fluctuating Membranes * Plato, Apollonius, and Klein: Playing with Spheres * Steady States in Nonequilibrium Lattice Systems * CONVODE: A REDUCE Package for Differential Equations * Chaos in Coupled Rotators * Symplectic Numerical Methods for Hamiltonian Problems * Computer Simulations of Surfactant Self Assembly * High-dimensional and Very Large Cellular Automata for Immunological Shape Space * A Review of the Lattice Boltzmann Method * Electronic Structure of Solids in the Self-interaction Corrected Local-spin-density Approximation * Dedicated Computers for Lattice Gauge Theory Simulations * Physics Education: A Survey of Problems and Possible Solutions * Parallel Computing and Electronic-Structure Theory * High Precision Simulation Techniques for Lattice Field Theory * CONTRIBUTED PAPERS * Case Study of Microscale Hydrodynamics Using Molecular Dynamics and Lattice Gas Methods * Computer Modelling of the Structural and Electronic Properties of the Supported Metal Catalysis * Ordered Particle Simulations for Serial and MIMD Parallel Computers * "NOLP" -- Program Package for Laser Plasma Nonlinear Optics * Algorithms to Solve Nonlinear Least Square Problems * Distribution of Hydrogen Atoms in Pd-H Computed by Molecular Dynamics * A Ray Tracing of Optical System for Protein Crystallography Beamline at Storage Ring-SIBERIA-2 * Vibrational Properties of a Pseudobinary Linear Chain with Correlated Substitutional Disorder * Application of the Software Package Mathematica in Generalized Master Equation Method * Linelist: An Interactive Program for Analysing Beam-foil Spectra * GROMACS: A Parallel Computer for Molecular Dynamics Simulations * GROMACS Method of Virial Calculation Using a Single Sum * The Interactive Program for the Solution of the Laplace Equation with the Elimination of Singularities for Boundary Functions * Random-Number Generators: Testing Procedures and Comparison of RNG Algorithms * Micro-TOPIC: A Tokamak Plasma Impurities Code * Rotational Molecular Scattering Calculations * Orthonormal Polynomial Method for Calibrating of Cryogenic Temperature Sensors * Frame-based System Representing Basis of Physics * The Role of Massively Data-parallel Computers in Large Scale Molecular Dynamics Simulations * Short-range Molecular Dynamics on a Network of Processors and Workstations * An Algorithm for Higher-order Perturbation Theory in Radiative Transfer Computations * Hydrostochastics: The Master Equation Formulation of Fluid Dynamics * HPP Lattice Gas on Transputers and Networked Workstations * Study on the Hysteresis Cycle Simulation Using Modeling with Different Functions on Intervals * Refined Pruning Techniques for Feed-forward Neural Networks * Random Walk Simulation of the Motion of Transient Charges in Photoconductors * The Optical Hysteresis in Hydrogenated Amorphous Silicon * Diffusion Monte Carlo Analysis of Modern Interatomic Potentials for He * A Parallel Strategy for Molecular Dynamics Simulations of Polar Liquids on Transputer Arrays * Distribution of Ions Reflected on Rough Surfaces * The Study of Step Density Distribution During Molecular Beam Epitaxy Growth: Monte Carlo Computer Simulation * Towards a Formal Approach to the Construction of Large-scale Scientific Applications Software * Correlated Random Walk and Discrete Modelling of Propagation through Inhomogeneous Media * Teaching Plasma Physics Simulation * A Theoretical Determination of the Au-Ni Phase Diagram * Boson and Fermion Kinetics in One-dimensional Lattices * Computational Physics Course on the Technical University * Symbolic Computations in Simulation Code Development and Femtosecond-pulse Laser-plasma Interaction Studies * Computer Algebra and Integrated Computing Systems in Education of Physical Sciences * Coordinated System of Programs for Undergraduate Physics Instruction * Program Package MIRIAM and Atomic Physics of Extreme Systems * High Energy Physics Simulation on the T_Node * The Chapman-Kolmogorov Equation as Representation of Huygens' Principle and the Monolithic Self-consistent Numerical Modelling of Lasers * Authoring System for Simulation Developments * Molecular Dynamics Study of Ion Charge Effects in the Structure of Ionic Crystals * A Computational Physics Introductory Course * Computer Calculation of Substrate Temperature Field in MBE System * Multimagnetical Simulation of the Ising Model in Two and Three Dimensions * Failure of the CTRW Treatment of the Quasicoherent Excitation Transfer * Implementation of a Parallel Conjugate Gradient Method for Simulation of Elastic Light Scattering * Algorithms for Study of Thin Film Growth * Algorithms and Programs for Physics Teaching in Romanian Technical Universities * Multicanonical Simulation of 1st order Transitions: Interface Tension of the 2D 7-State Potts Model * Two Numerical Methods for the Calculation of Periodic Orbits in Hamiltonian Systems * Chaotic Behavior in a Probabilistic Cellular Automata? * Wave Optics Computing by a Networked-based Vector Wave Automaton * Tensor Manipulation Package in REDUCE * Propagation of Electromagnetic Pulses in Stratified Media * The Simple Molecular Dynamics Model for the Study of Thermalization of the Hot Nucleon Gas * Electron Spin Polarization in PdCo Alloys Calculated by KKR-CPA-LSD Method * Simulation Studies of Microscopic Droplet Spreading * A Vectorizable Algorithm for the Multicolor Successive Overrelaxation Method * Tetragonality of the CuAu I Lattice and Its Relation to Electronic Specific Heat and Spin Susceptibility * Computer Simulation of the Formation of Metallic Aggregates Produced by Chemical Reactions in Aqueous Solution * Scaling in Growth Models with Diffusion: A Monte Carlo Study * The Nucleus as the Mesoscopic System * Neural Network Computation as Dynamic System Simulation * First-principles Theory of Surface Segregation in Binary Alloys * Data Smooth Approximation Algorithm for Estimating the Temperature Dependence of the Ice Nucleation Rate * Genetic Algorithms in Optical Design * Application of 2D-FFT in the Study of Molecular Exchange Processes by NMR * Advanced Mobility Model for Electron Transport in P-Si Inversion Layers * Computer Simulation for Film Surfaces and its Fractal Dimension * Parallel Computation Techniques and the Structure of Catalyst Surfaces * Educational SW to Teach Digital Electronics and the Corresponding Text Book * Primitive Trinomials (Mod 2) Whose Degree is a Mersenne Exponent * Stochastic Modelisation and Parallel Computing * Remarks on the Hybrid Monte Carlo Algorithm for the ∫4 Model * An Experimental Computer Assisted Workbench for Physics Teaching * A Fully Implicit Code to Model Tokamak Plasma Edge Transport * EXPFIT: An Interactive Program for Automatic Beam-foil Decay Curve Analysis * Mapping Technique for Solving General, 1-D Hamiltonian Systems * Freeway Traffic, Cellular Automata, and Some (Self-Organizing) Criticality * Photonuclear Yield Analysis by Dynamic Programming * Incremental Representation of the Simply Connected Planar Curves * Self-convergence in Monte Carlo Methods * Adaptive Mesh Technique for Shock Wave Propagation * Simulation of Supersonic Coronal Streams and Their Interaction with the Solar Wind * The Nature of Chaos in Two Systems of Ordinary Nonlinear Differential Equations * Considerations of a Window-shopper * Interpretation of Data Obtained by RTP 4-Channel Pulsed Radar Reflectometer Using a Multi Layer Perceptron * Statistics of Lattice Bosons for Finite Systems * Fractal Based Image Compression with Affine Transformations * Algorithmic Studies on Simulation Codes for Heavy-ion Reactions * An Energy-Wise Computer Simulation of DNA-Ion-Water Interactions Explains the Abnormal Structure of Poly[d(A)]:Poly[d(T)] * Computer Simulation Study of Kosterlitz-Thouless-Like Transitions * Problem-oriented Software Package GUN-EBT for Computer Simulation of Beam Formation and Transport in Technological Electron-Optical Systems * Parallelization of a Boundary Value Solver and its Application in Nonlinear Dynamics * The Symbolic Classification of Real Four-dimensional Lie Algebras * Short, Singular Pulses Generation by a Dye Laser at Two Wavelengths Simultaneously * Quantum Monte Carlo Simulations of the Apex-Oxygen-Model * Approximation Procedures for the Axial Symmetric Static Einstein-Maxwell-Higgs Theory * Crystallization on a Sphere: Parallel Simulation on a Transputer Network * FAMULUS: A Software Product (also) for Physics Education * MathCAD vs. FAMULUS -- A Brief Comparison * First-principles Dynamics Used to Study Dissociative Chemisorption * A Computer Controlled System for Crystal Growth from Melt * A Time Resolved Spectroscopic Method for Short Pulsed Particle Emission * Green's Function Computation in Radiative Transfer Theory * Random Search Optimization Technique for One-criteria and Multi-criteria Problems * Hartley Transform Applications to Thermal Drift Elimination in Scanning Tunneling Microscopy * Algorithms of Measuring, Processing and Interpretation of Experimental Data Obtained with Scanning Tunneling Microscope * Time-dependent Atom-surface Interactions * Local and Global Minima on Molecular Potential Energy Surfaces: An Example of N3 Radical * Computation of Bifurcation Surfaces * Symbolic Computations in Quantum Mechanics: Energies in Next-to-solvable Systems * A Tool for RTP Reactor and Lamp Field Design * Modelling of Particle Spectra for the Analysis of Solid State Surface * List of Participants
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Robot Acting on Moving Bodies (RAMBO): Interaction with tumbling objects

NASA Technical Reports Server (NTRS)

Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madhu; Harwood, David

1989-01-01

Interaction with tumbling objects will become more common as human activities in space expand. Attempting to interact with a large complex object translating and rotating in space, a human operator using only his visual and mental capacities may not be able to estimate the object motion, plan actions or control those actions. A robot system (RAMBO) equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a tumbling object, is being developed. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations rearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enhancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Concepts of Concurrent Programming

DTIC Science & Technology

1990-04-01

to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei; Xu, Jingling

2006-01-01

This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
On program restructuring, scheduling, and communication for parallel processor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polychronopoulos, Constantine D.

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
A molecular dynamics implementation of the 3D Mercedes-Benz water model

NASA Astrophysics Data System (ADS)

Hynninen, T.; Dias, C. L.; Mkrtchyan, A.; Heinonen, V.; Karttunen, M.; Foster, A. S.; Ala-Nissila, T.

2012-02-01

The three-dimensional Mercedes-Benz model was recently introduced to account for the structural and thermodynamic properties of water. It treats water molecules as point-like particles with four dangling bonds in tetrahedral coordination, representing H-bonds of water. Its conceptual simplicity renders the model attractive in studies where complex behaviors emerge from H-bond interactions in water, e.g., the hydrophobic effect. A molecular dynamics (MD) implementation of the model is non-trivial and we outline here the mathematical framework of its force-field. Useful routines written in modern Fortran are also provided. This open source code is free and can easily be modified to account for different physical context. The provided code allows both serial and MPI-parallelized execution. Program summaryProgram title: CASHEW (Coarse Approach Simulator for Hydrogen-bonding Effects in Water) Catalogue identifier: AEKM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEKM_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 20 501 No. of bytes in distributed program, including test data, etc.: 551 044 Distribution format: tar.gz Programming language: Fortran 90 Computer: Program has been tested on desktop workstations and a Cray XT4/XT5 supercomputer. Operating system: Linux, Unix, OS X Has the code been vectorized or parallelized?: The code has been parallelized using MPI. RAM: Depends on size of system, about 5 MB for 1500 molecules. Classification: 7.7 External routines: A random number generator, Mersenne Twister ( http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/VERSIONS/FORTRAN/mt95.f90), is used. A copy of the code is included in the distribution. Nature of problem: Molecular dynamics simulation of a new geometric water model. Solution method: New force-field for water molecules, velocity-Verlet integration, representation of molecules as rigid particles with rotations described using quaternion algebra. Restrictions: Memory and cpu time limit the size of simulations. Additional comments: Software web site: https://gitorious.org/cashew/. Running time: Depends on the size of system. The sample tests provided only take a few seconds.
Solar wind interaction with Venus and Mars in a parallel hybrid code

NASA Astrophysics Data System (ADS)

Jarvinen, Riku; Sandroos, Arto

2013-04-01

We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

NASA Astrophysics Data System (ADS)

Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

2017-11-01

The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).

Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Event Generators for Simulating Heavy Ion Interactions of Interest in Evaluating Risks in Human Spaceflight

NASA Technical Reports Server (NTRS)

Wilson, Thomas L.; Pinsky, Lawrence; Andersen, Victor; Empl, Anton; Lee, Kerry; Smirmov, Georgi; Zapp, Neal; Ferrari, Alfredo; Tsoulou, Katerina; Roesler, Stefan;

2005-01-01

Simulating the Space Radiation environment with Monte Carlo Codes, such as FLUKA, requires the ability to model the interactions of heavy ions as they penetrate spacecraft and crew member's bodies. Monte-Carlo-type transport codes use total interaction cross sections to determine probabilistically when a particular type of interaction has occurred. Then, at that point, a distinct event generator is employed to determine separately the results of that interaction. The space radiation environment contains a full spectrum of radiation types, including relativistic nuclei, which are the most important component for the evaluation of crew doses. Interactions between incident protons with target nuclei in the spacecraft materials and crew member's bodies are well understood. However, the situation is substantially less comfortable for incident heavier nuclei (heavy ions). We have been engaged in developing several related heavy ion interaction models based on a Quantum Molecular Dynamics-type approach for energies up through about 5 GeV per nucleon (GeV/A) as part of a NASA Consortium that includes a parallel program of cross section measurements to guide and verify this code development.

Web Based Parallel Programming Workshop for Undergraduate Education.

ERIC Educational Resources Information Center

Marcus, Robert L.; Robertson, Douglass

Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Scalable and portable visualization of large atomistic datasets

NASA Astrophysics Data System (ADS)

Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2004-10-01

A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal rendering Restrictions on the complexity of the problem: None Typical running time: The program is interactive in its execution Unusual features of the program: None References: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)].
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Marionette

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sullivan, M.; Anderson, D.P.

1988-01-01

Marionette is a system for distributed parallel programming in an environment of networked heterogeneous computer systems. It is based on a master/slave model. The master process can invoke worker operations (asynchronous remote procedure calls to single slaves) and context operations (updates to the state of all slaves). The master and slaves also interact through shared data structures that can be modified only by the master. The master and slave processes are programmed in a sequential language. The Marionette runtime system manages slave process creation, propagates shared data structures to slaves as needed, queues and dispatches worker and context operations, andmore » manages recovery from slave processor failures. The Marionette system also includes tools for automated compilation of program binaries for multiple architectures, and for distributing binaries to remote fuel systems. A UNIX-based implementation of Marionette is described.« less
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Development of a stereo analysis algorithm for generating topographic maps using interactive techniques of the MPP

NASA Technical Reports Server (NTRS)

Strong, James P.

1987-01-01

A local area matching algorithm was developed on the Massively Parallel Processor (MPP). It is an iterative technique that first matches coarse or low resolution areas and at each iteration performs matches of higher resolution. Results so far show that when good matches are possible in the two images, the MPP algorithm matches corresponding areas as well as a human observer. To aid in developing this algorithm, a control or shell program was developed for the MPP that allows interactive experimentation with various parameters and procedures to be used in the matching process. (This would not be possible without the high speed of the MPP). With the system, optimal techniques can be developed for different types of matching problems.
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Program Correctness, Verification and Testing for Exascale (Corvette)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Koushik; Iancu, Costin; Demmel, James W

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
8th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG): April 16–18, 2015, Republic of Malta

PubMed Central

Homberg, Anita; Hinzmann, Rolf

2015-01-01

Abstract International experts in the fields of diabetes, diabetes technology, endocrinology, mobile health, sport science, and regulatory issues gathered for the 8th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG) with a focus on personalized diabetes management. The aim of this meeting was to facilitate new collaborations and research projects to improve the lives of people with diabetes. The 2015 meeting comprised a comprehensive scientific program, parallel interactive workshops, and two keynote lectures. PMID:26496678
LB3D: A parallel implementation of the Lattice-Boltzmann method for simulation of interacting amphiphilic fluids

NASA Astrophysics Data System (ADS)

Schmieschek, S.; Shamardin, L.; Frijters, S.; Krüger, T.; Schiller, U. D.; Harting, J.; Coveney, P. V.

2017-08-01

We introduce the lattice-Boltzmann code LB3D, version 7.1. Building on a parallel program and supporting tools which have enabled research utilising high performance computing resources for nearly two decades, LB3D version 7 provides a subset of the research code functionality as an open source project. Here, we describe the theoretical basis of the algorithm as well as computational aspects of the implementation. The software package is validated against simulations of meso-phases resulting from self-assembly in ternary fluid mixtures comprising immiscible and amphiphilic components such as water-oil-surfactant systems. The impact of the surfactant species on the dynamics of spinodal decomposition are tested and quantitative measurement of the permeability of a body centred cubic (BCC) model porous medium for a simple binary mixture is described. Single-core performance and scaling behaviour of the code are reported for simulations on current supercomputer architectures.
Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Kwak, Dochan; Chan, William

2000-01-01

This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Analysis of Plane-Parallel Electron Beam Propagation in Different Media by Numerical Simulation Methods

NASA Astrophysics Data System (ADS)

Miloichikova, I. A.; Bespalov, V. I.; Krasnykh, A. A.; Stuchebrov, S. G.; Cherepennikov, Yu. M.; Dusaev, R. R.

2018-04-01

Simulation by the Monte Carlo method is widely used to calculate the character of ionizing radiation interaction with substance. A wide variety of programs based on the given method allows users to choose the most suitable package for solving computational problems. In turn, it is important to know exactly restrictions of numerical systems to avoid gross errors. Results of estimation of the feasibility of application of the program PCLab (Computer Laboratory, version 9.9) for numerical simulation of the electron energy distribution absorbed in beryllium, aluminum, gold, and water for industrial, research, and clinical beams are presented. The data obtained using programs ITS and Geant4 being the most popular software packages for solving the given problems and the program PCLab are presented in the graphic form. A comparison and an analysis of the results obtained demonstrate the feasibility of application of the program PCLab for simulation of the absorbed energy distribution and dose of electrons in various materials for energies in the range 1-20 MeV.
A Survey of Parallel Sorting Algorithms.

DTIC Science & Technology

1981-12-01

see that, in this algorithm, each Processor i, for 1 itp -2, interacts directly only with Processors i+l and i-l. Processor j 0 only interacts with...Chan76] Chandra, A.K., "Maximal Parallelism in Matrix Multiplication," IBM Report RC. 6193, Watson Research Center, Yorktown Heights, N.Y., October 1976
Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

ERIC Educational Resources Information Center

Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

2013-01-01

Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…
Quantum statistics and squeezing for a microwave-driven interacting magnon system.

PubMed

Haghshenasfard, Zahra; Cottam, Michael G

2017-02-01

Theoretical studies are reported for the statistical properties of a microwave-driven interacting magnon system. Both the magnetic dipole-dipole and the exchange interactions are included and the theory is developed for the case of parallel pumping allowing for the inclusion of the nonlinear processes due to the four-magnon interactions. The method of second quantization is used to transform the total Hamiltonian from spin operators to boson creation and annihilation operators. By using the coherent magnon state representation we have studied the magnon occupation number and the statistical behavior of the system. In particular, it is shown that the nonlinearities introduced by the parallel pumping field and the four-magnon interactions lead to non-classical quantum statistical properties of the system, such as magnon squeezing. Also control of the collapse-and-revival phenomena for the time evolution of the average magnon number is demonstrated by varying the parallel pumping amplitude and the four-magnon coupling.
The parallel programming of voluntary and reflexive saccades.

PubMed

Walker, Robin; McSorley, Eugene

2006-06-01

A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-18

...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.
Biocellion: accelerating computer simulation of multicellular biological system models.

PubMed

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-11-01

Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Electromagnetic backscattering from one-dimensional drifting fractal sea surface I: Wave-current coupled model

NASA Astrophysics Data System (ADS)

Tao, Xie; Shang-Zhuo, Zhao; William, Perrie; He, Fang; Wen-Jin, Yu; Yi-Jun, He

2016-06-01

To study the electromagnetic backscattering from a one-dimensional drifting fractal sea surface, a fractal sea surface wave-current model is derived, based on the mechanism of wave-current interactions. The numerical results show the effect of the ocean current on the wave. Wave amplitude decreases, wavelength and kurtosis of wave height increase, spectrum intensity decreases and shifts towards lower frequencies when the current occurs parallel to the direction of the ocean wave. By comparison, wave amplitude increases, wavelength and kurtosis of wave height decrease, spectrum intensity increases and shifts towards higher frequencies if the current is in the opposite direction to the direction of ocean wave. The wave-current interaction effect of the ocean current is much stronger than that of the nonlinear wave-wave interaction. The kurtosis of the nonlinear fractal ocean surface is larger than that of linear fractal ocean surface. The effect of the current on skewness of the probability distribution function is negligible. Therefore, the ocean wave spectrum is notably changed by the surface current and the change should be detectable in the electromagnetic backscattering signal. Project supported by the National Natural Science Foundation of China (Grant No. 41276187), the Global Change Research Program of China (Grant No. 2015CB953901), the Priority Academic Development Program of Jiangsu Higher Education Institutions (PAPD), Program for the Innovation Research and Entrepreneurship Team in Jiangsu Province, China, the Canadian Program on Energy Research and Development, and the Canadian World Class Tanker Safety Service.
Fabrication of First 4-m Coils for the LARP MQXFA Quadrupole and Assembly in Mirror Structure

DOE PAGES

Holik, Eddie Frank; Ambrosio, Giorgio; Anerella, Michael; ...

2017-01-23

The US LHC Accelerator Research Program is constructing prototype interaction region quadrupoles as part of the US in-kind contribution to the Hi-Lumi LHC project. The low-beta MQXFA Q1/Q3 coils have a 4-m length and a 150 mm bore. The design is first validated on short, one meter models (MQXFS) developed as part of the longstanding Nb3Sn quadrupole R&D by LARP in collaboration with CERN. In parallel, facilities and tooling are being developed and refined at BNL, LBNL, and FNAL to enable long coil production, assembly, and cold testing. Long length scale-up is based on the experience from the LARP 90more » mm aperture (TQ-LQ) and 120 mm aperture (HQ and Long HQ) programs. A 4-m long MQXF practice coil was fabricated, water jet cut and analyzed to verify procedures, parts, and tooling. In parallel, the first complete prototype coil (QXFP01a) was fabricated and assembled in a long magnetic mirror, MQXFPM1, to provide early feedback on coil design and fabrication following the successful experience of previous LARP mirror tests.« less
Progress in Computational Simulation of Earthquakes

NASA Technical Reports Server (NTRS)

Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert

2006-01-01

GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

Anisotropic Surface State Mediated RKKY Interaction Between Adatoms on a Hexagonal Lattice

NASA Astrophysics Data System (ADS)

Einstein, Theodore; Patrone, Paul

2012-02-01

Motivated by recent numerical studies of Ag on Pt(111), we derive a far-field expression for the RKKY interaction mediated by surface states on a (111) FCC surface, considering the effect of anisotropy in the Fermi edge. The main contribution to the interaction comes from electrons whose Fermi velocity vF is parallel to the vector R connecting the interacting adatoms; we show that in general, the corresponding Fermi wave-vector kF is not parallel to R. The interaction is oscillatory; the amplitude and wavelength of oscillations have angular dependence arising from the anisotropy of the surface state band structure. The wavelength, in particular, is determined by the component of the aforementioned kF that is parallel to R. Our analysis is easily generalized to other systems. For Ag on Pt(111), our results indicate that the RKKY interaction between pairs of adatoms should be nearly isotropic and so cannot account for the anisotropy found in the studies motivating our work.
Do all roads lead to Rome? The role of neuro-immune interactions before birth in the programming of offspring obesity

PubMed Central

Jasoni, Christine L.; Sanders, Tessa R.; Kim, Dong Won

2015-01-01

The functions of the nervous system can be powerfully modulated by the immune system. Although traditionally considered to be quite separate, neuro-immune interactions are increasingly recognized as critical for both normal and pathological nervous system function in the adult. However, a growing body of information supports a critical role for neuro-immune interactions before birth, particularly in the prenatal programming of later-life neurobehavioral disease risk. This review will focus on maternal obesity, as it represents an environment of pathological immune system function during pregnancy that elevates offspring neurobehavioral disease risk. We will first delineate the normal role of the immune system during pregnancy, including the role of the placenta as both a barrier and relayer of inflammatory information between the maternal and fetal environments. This will be followed by the current exciting findings of how immuno-modulatory molecules may elevate offspring risk of neurobehavioral disease by altering brain development and, consequently, later life function. Finally, by drawing parallels with pregnancy complications other than obesity, we will suggest that aberrant immune activation, irrespective of its origin, may lead to neuro-immune interactions that otherwise would not exist in the developing brain. These interactions could conceivably derail normal brain development and/or later life function, and thereby elevate risk for obesity and other neurobehavioral disorders later in the offspring's life. PMID:25691854
Describing, using 'recognition cones'. [parallel-series model with English-like computer program

NASA Technical Reports Server (NTRS)

Uhr, L.

1973-01-01

A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Paralex: An Environment for Parallel Programming in Distributed Systems

DTIC Science & Technology

1991-12-07

distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Parallel O(N) Stokes’ solver towards scalable Brownian dynamics of hydrodynamically interacting objects in general geometries

DOE PAGES

Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...

2017-06-29

An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Emulsion droplet interactions: a front-tracking treatment

NASA Astrophysics Data System (ADS)

Mason, Lachlan; Juric, Damir; Chergui, Jalel; Shin, Seungwon; Craster, Richard V.; Matar, Omar K.

2017-11-01

Emulsion coalescence influences a multitude of industrial applications including solvent extraction, oil recovery and the manufacture of fast-moving consumer goods. Droplet interaction models are vital for the design and scale-up of processing systems, however predictive modelling at the droplet-scale remains a research challenge. This study simulates industrially relevant moderate-inertia collisions for which a high degree of droplet deformation occurs. A hybrid front-tracking/level-set approach is used to automatically account for interface merging without the need for `bookkeeping' of interface connectivity. The model is implemented in Code BLUE using a parallel multi-grid solver, allowing both film and droplet-scale dynamics to be resolved efficiently. Droplet interaction simulations are validated using experimental sequences from the literature in the presence and absence of background turbulence. The framework is readily extensible for modelling the influence of surfactants and non-Newtonian fluids on droplet interaction processes. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM), PETRONAS.
Parallels between Global Transcriptional Programs of Polarizing Caco-2 Intestinal Epithelial Cells In Vitro and Gene Expression Programs in Normal Colon and Colon Cancer

PubMed Central

Sääf, Annika M.; Halbleib, Jennifer M.; Chen, Xin; Yuen, Siu Tsan; Leung, Suet Yi

2007-01-01

Posttranslational mechanisms are implicated in the development of epithelial cell polarity, but little is known about the patterns of gene expression and transcriptional regulation during this process. We characterized temporal patterns of gene expression during cell–cell adhesion-initiated polarization of cultured human Caco-2 cells, which develop structural and functional polarity resembling enterocytes in vivo. A distinctive switch in gene expression patterns occurred upon formation of cell–cell contacts. Comparison to gene expression patterns in normal human colon and colon tumors revealed that the pattern in proliferating, nonpolarized Caco-2 cells paralleled patterns seen in human colon cancer in vivo, including expression of genes involved in cell proliferation. The pattern switched in polarized Caco-2 cells to one more closely resembling that in normal colon tissue, indicating that regulation of transcription underlying Caco-2 cell polarization is similar to that during enterocyte differentiation in vivo. Surprisingly, the temporal program of gene expression in polarizing Caco-2 cells involved changes in signaling pathways (e.g., Wnt, Hh, BMP, FGF) in patterns similar to those during migration and differentiation of intestinal epithelial cells in vivo, despite the absence of morphogen gradients and interactions with stromal cells characteristic of enterocyte differentiation in situ. The full data set is available at http://microarray-pubs.stanford.edu/CACO2. PMID:17699589
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Cavity-photon contribution to the effective interaction of electrons in parallel quantum dots

NASA Astrophysics Data System (ADS)

Gudmundsson, Vidar; Sitek, Anna; Abdullah, Nzar Rauf; Tang, Chi-Shung; Manolescu, Andrei

2016-05-01

A single cavity photon mode is expected to modify the Coulomb interaction of an electron system in the cavity. Here we investigate this phenomena in a parallel double quantum dot system. We explore properties of the closed system and the system after it has been opened up for electron transport. We show how results for both cases support the idea that the effective electron-electron interaction becomes more repulsive in the presence of a cavity photon field. This can be understood in terms of the cavity photons dressing the polarization terms in the effective mutual electron interaction leading to nontrivial delocalization or polarization of the charge in the double parallel dot potential. In addition, we find that the effective repulsion of the electrons can be reduced by quadrupolar collective oscillations excited by an external classical dipole electric field.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Kentucky DOE EPSCoR Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grulke, Eric; Stencel, John

2011-09-13

The KY DOE EPSCoR Program supports two research clusters. The Materials Cluster uses unique equipment and computational methods that involve research expertise at the University of Kentucky and University of Louisville. This team determines the physical, chemical and mechanical properties of nanostructured materials and examines the dominant mechanisms involved in the formation of new self-assembled nanostructures. State-of-the-art parallel computational methods and algorithms are used to overcome current limitations of processing that otherwise are restricted to small system sizes and short times. The team also focuses on developing and applying advanced microtechnology fabrication techniques and the application of microelectrornechanical systems (MEMS)more » for creating new materials, novel microdevices, and integrated microsensors. The second research cluster concentrates on High Energy and Nuclear Physics. lt connects research and educational activities at the University of Kentucky, Eastern Kentucky University and national DOE research laboratories. Its vision is to establish world-class research status dedicated to experimental and theoretical investigations in strong interaction physics. The research provides a forum, facilities, and support for scientists to interact and collaborate in subatomic physics research. The program enables increased student involvement in fundamental physics research through the establishment of graduate fellowships and collaborative work.« less
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

PubMed

Burkhart, Diane N; Lischka, Terri A

2011-04-01

Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.

Monte Carlo simulation of biomolecular systems with BIOMCSIM

NASA Astrophysics Data System (ADS)

Kamberaj, H.; Helms, V.

2001-12-01

A new Monte Carlo simulation program, BIOMCSIM, is presented that has been developed in particular to simulate the behaviour of biomolecular systems, leading to insights and understanding of their functions. The computational complexity in Monte Carlo simulations of high density systems, with large molecules like proteins immersed in a solvent medium, or when simulating the dynamics of water molecules in a protein cavity, is enormous. The program presented in this paper seeks to provide these desirable features putting special emphasis on simulations in grand canonical ensembles. It uses different biasing techniques to increase the convergence of simulations, and periodic load balancing in its parallel version, to maximally utilize the available computer power. In periodic systems, the long-ranged electrostatic interactions can be treated by Ewald summation. The program is modularly organized, and implemented using an ANSI C dialect, so as to enhance its modifiability. Its performance is demonstrated in benchmark applications for the proteins BPTI and Cytochrome c Oxidase.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
ISS and TPD study of the adsorption and interaction of CO and H2 on polycrystalline Pt

NASA Technical Reports Server (NTRS)

Melendez, Orlando; Hoflund, Gar B.; Schryer, David R.

1990-01-01

The adsorption and interaction of CO and H2 on polycrystalline Pt has been studied using ion scattering spectroscopy (ISS) and temperature programmed desorption (TPD). The ISS results indicate that the initial CO adsorption on Pt takes place very rapidly and saturates the Pt surface with coverage close to a monolayer. ISS also shows that the CO molecules adsorb at an angular orientation from the surface normal and perhaps parallel to the surface. A TPD spectrum obtained after coadsorbing C-12 O-16 and C-13 O-18 on Pt shows no isotopic mixing, which is indicative of molecular CO adsorption. TPD spectra obtained after coadsorbing H2 and CO on polycrystalline Pt provides evidence for the formation of a CO-H surface species.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.

PubMed

Catarinucci, Luca; Tarricone, Luciano

2009-01-01

The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
On the dimensionally correct kinetic theory of turbulence for parallel propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaelzer, R., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br; Ziebell, L. F., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br; Yoon, P. H., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br

2015-03-15

Yoon and Fang [Phys. Plasmas 15, 122312 (2008)] formulated a second-order nonlinear kinetic theory that describes the turbulence propagating in directions parallel/anti-parallel to the ambient magnetic field. Their theory also includes discrete-particle effects, or the effects due to spontaneously emitted thermal fluctuations. However, terms associated with the spontaneous fluctuations in particle and wave kinetic equations in their theory contain proper dimensionality only for an artificial one-dimensional situation. The present paper extends the analysis and re-derives the dimensionally correct kinetic equations for three-dimensional case. The new formalism properly describes the effects of spontaneous fluctuations emitted in three-dimensional space, while the collectivelymore » emitted turbulence propagates predominantly in directions parallel/anti-parallel to the ambient magnetic field. As a first step, the present investigation focuses on linear wave-particle interaction terms only. A subsequent paper will include the dimensionally correct nonlinear wave-particle interaction terms.« less
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Concurrency-based approaches to parallel programming

NASA Technical Reports Server (NTRS)

Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

1995-01-01

The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.

Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel Volunteer Learning during Youth Programs

ERIC Educational Resources Information Center

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Mechanism to support generic collective communication across a variety of programming models

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY

2011-07-19

A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
Verification and Planning Based on Coinductive Logic Programming

NASA Technical Reports Server (NTRS)

Bansal, Ajay; Min, Richard; Simon, Luke; Mallya, Ajay; Gupta, Gopal

2008-01-01

Coinduction is a powerful technique for reasoning about unfounded sets, unbounded structures, infinite automata, and interactive computations [6]. Where induction corresponds to least fixed point's semantics, coinduction corresponds to greatest fixed point semantics. Recently coinduction has been incorporated into logic programming and an elegant operational semantics developed for it [11, 12]. This operational semantics is the greatest fix point counterpart of SLD resolution (SLD resolution imparts operational semantics to least fix point based computations) and is termed co- SLD resolution. In co-SLD resolution, a predicate goal p( t) succeeds if it unifies with one of its ancestor calls. In addition, rational infinite terms are allowed as arguments of predicates. Infinite terms are represented as solutions to unification equations and the occurs check is omitted during the unification process. Coinductive Logic Programming (Co-LP) and Co-SLD resolution can be used to elegantly perform model checking and planning. A combined SLD and Co-SLD resolution based LP system forms the common basis for planning, scheduling, verification, model checking, and constraint solving [9, 4]. This is achieved by amalgamating SLD resolution, co-SLD resolution, and constraint logic programming [13] in a single logic programming system. Given that parallelism in logic programs can be implicitly exploited [8], complex, compute-intensive applications (planning, scheduling, model checking, etc.) can be executed in parallel on multi-core machines. Parallel execution can result in speed-ups as well as in larger instances of the problems being solved. In the remainder we elaborate on (i) how planning can be elegantly and efficiently performed under real-time constraints, (ii) how real-time systems can be elegantly and efficiently model- checked, as well as (iii) how hybrid systems can be verified in a combined system with both co-SLD and SLD resolution. Implementations of co-SLD resolution as well as preliminary implementations of the planning and verification applications have been developed [4]. Co-LP and Model Checking: The vast majority of properties that are to be verified can be classified into safety properties and liveness properties. It is well known within model checking that safety properties can be verified by reachability analysis, i.e, if a counter-example to the property exists, it can be finitely determined by enumerating all the reachable states of the Kripke structure.
Programming with Intervals

NASA Astrophysics Data System (ADS)

Matsakis, Nicholas D.; Gross, Thomas R.

Intervals are a new, higher-level primitive for parallel programming with which programmers directly construct the program schedule. Programs using intervals can be statically analyzed to ensure that they do not deadlock or contain data races. In this paper, we demonstrate the flexibility of intervals by showing how to use them to emulate common parallel control-flow constructs like barriers and signals, as well as higher-level patterns such as bounded-buffer producer-consumer. We have implemented intervals as a publicly available library for Java and Scala.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Revision of FMM-Yukawa: An adaptive fast multipole method for screened Coulomb interactions

NASA Astrophysics Data System (ADS)

Zhang, Bo; Huang, Jingfang; Pitsianis, Nikos P.; Sun, Xiaobai

2010-12-01

FMM-YUKAWA is a mathematical software package primarily for rapid evaluation of the screened Coulomb interactions of N particles in three dimensional space. Since its release, we have revised and re-organized the data structure, software architecture, and user interface, for the purpose of enabling more flexible, broader and easier use of the package. The package and its documentation are available at http://www.fastmultipole.org/, along with a few other closely related mathematical software packages. New version program summaryProgram title: FMM-Yukawa Catalogue identifier: AEEQ_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEQ_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPL 2.0 No. of lines in distributed program, including test data, etc.: 78 704 No. of bytes in distributed program, including test data, etc.: 854 265 Distribution format: tar.gz Programming language: FORTRAN 77, FORTRAN 90, and C. Requires gcc and gfortran version 4.4.3 or later Computer: All Operating system: Any Classification: 4.8, 4.12 Catalogue identifier of previous version: AEEQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2331 Does the new version supersede the previous version?: Yes Nature of problem: To evaluate the screened Coulomb potential and force field of N charged particles, and to evaluate a convolution type integral where the Green's function is the fundamental solution of the modified Helmholtz equation. Solution method: The new version of fast multipole method (FMM) that diagonalizes the multipole-to-local translation operator is applied with the tree structure adaptive to sample particle locations. Reasons for new version: To handle much larger particle ensembles, to enable the iterative use of the subroutines in a solver, and to remove potential contention in assignments for parallelization. Summary of revisions: The software package FMM-Yukawa has been revised and re-organized in data structure, software architecture, programming methods, and user interface. The revision enables more flexible use of the package and economic use of memory resources. It consists of five stages. The initial stage (stage 1) determines, based on the accuracy requirement and FMM theory, the length of multipole expansions and the number of quadrature points for diagonalization, and loads the quadrature nodes and weights that are computed off line. Stage 2 constructs the oct-tree and interaction lists, with adaptation to the sparsity or density of particles and employing a dynamic memory allocation scheme at every tree level. Stage 3 executes the core FMM subroutine for numerical calculation of the particle interactions. The subroutine can now be used iteratively as in a solver, while the particle locations remain the same. Stage 4 releases the memory allocated in Stage 2 for the adaptive tree and interaction lists. The user can modify the iterative routine easily. When the particle locations are changed such as in a molecular dynamics simulation, stage 2 to 4 can also be used together repeatedly. The final stage releases the memory space used for the quadrature and other remaining FMM parameters. Programs at the stage level and at the user interface are re-written in the C programming language, while most of the translation and interaction operations remain in FORTRAN. As a result of the change in data structures and memory allocation, the revised package can accommodate much larger particle ensembles while maintaining the same accuracy-efficiency performance. The new version is also developed as an important precursor to its parallel counterpart on multi-core or many core processors in a shared memory programming environment. Particularly, in order to ensure mutual exclusion in concurrent updates without incurring extra latency, we have replaced all the assignment statements at a source box that put its data to multiple target boxes with assignments at every target box that gather data from source boxes. This amounts to replacing the column version of matrix-vector multiplication with the row version. The matrix here, however, is in compressive representation. Sufficient care is taken in the revision not to alter the algorithmic complexity or numerical behavior, as concurrent writing potentially takes place in the upward calculation of the multipole expansion coefficients, interactions at every level of the FMM tree, and downward calculation of the local expansion coefficients. The software modules and their compositions are also organized according to the stages they are used. Demonstration files and makefiles for merging the user routines and the library routines are provided. Restrictions: Accuracy requirement is described in terms of three or six digits. Higher multiples of three digits will be allowed in a later version. Finer decimation in digits for accuracy specification may or may not be necessary. Unusual features: Ready and friendly for customized use and instrumental in expression of concurrency and dependency for efficient parallelization. Running time: The running time depends linearly on the number N of particles, and varies with the distribution characteristics of the particle distribution. It also depends on the accuracy requirement, a higher accuracy requirement takes relatively longer time. The code outperforms the direct summation method when N⩾750.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

DTIC Science & Technology

2015-11-20

1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Debugging Fortran on a shared memory machine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, T.R.; Padua, D.A.

1987-01-01

Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Parallel Demand-Withdraw Processes in Family Therapy for Adolescent Drug Abuse

PubMed Central

Rynes, Kristina N.; Rohrbaugh, Michael J.; Lebensohn-Chialvo, Florencia; Shoham, Varda

2013-01-01

Isomorphism, or parallel process, occurs in family therapy when patterns of therapist-client interaction replicate problematic interaction patterns within the family. This study investigated parallel demand-withdraw processes in Brief Strategic Family Therapy (BSFT) for adolescent drug abuse, hypothesizing that therapist-demand/adolescent-withdraw interaction (TD/AW) cycles observed early in treatment would predict poor adolescent outcomes at follow-up for families who exhibited entrenched parent-demand/adolescent-withdraw interaction (PD/AW) before treatment began. Participants were 91 families who received at least 4 sessions of BSFT in a multi-site clinical trial on adolescent drug abuse (Robbins et al., 2011). Prior to receiving therapy, families completed videotaped family interaction tasks from which trained observers coded PD/AW. Another team of raters coded TD/AW during two early BSFT sessions. The main dependent variable was the number of drug use days that adolescents reported in Timeline Follow-Back interviews 7 to 12 months after family therapy began. Zero-inflated Poisson (ZIP) regression analyses supported the main hypothesis, showing that PD/AW and TD/AW interacted to predict adolescent drug use at follow-up. For adolescents in high PD/AW families, higher levels of TD/AW predicted significant increases in drug use at follow-up, whereas for low PD/AW families, TD/AW and follow-up drug use were unrelated. Results suggest that attending to parallel demand-withdraw processes in parent/adolescent and therapist/adolescent dyads may be useful in family therapy for substance-using adolescents. PMID:23438248
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Parallelization of the Flow Field Dependent Variation Scheme for Solving the Triple Shock/Boundary Layer Interaction Problem

NASA Technical Reports Server (NTRS)

Schunk, Richard Gregory; Chung, T. J.

2001-01-01

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

1998-01-01

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.

The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
IPython: components for interactive and parallel computing across disciplines. (Invited)

NASA Astrophysics Data System (ADS)

Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.

2013-12-01

Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.
Parallel Computation of Ocean-Atmosphere-Wave Coupled Storm Surge Model

NASA Astrophysics Data System (ADS)

Kim, K.; Yamashita, T.

2003-12-01

Ocean-atmosphere interactions are very important in the formation and development of tropical storms. These interactions are dominant in exchanging heat, momentum, and moisture fluxes. Heat flux is usually computed using a bulk equation. In this equation air-sea interface supplies heat energy to the atmosphere and to the storm. Dynamical interaction is most often one way in which it is the atmosphere that drives the ocean. The winds transfer momentum to both ocean surface waves and ocean current. The wind wave makes an important role in the exchange of the quantities of motion, heat and a substance between the atmosphere and the ocean. Storm surges can be considered as the phenomena of mean sea-level changes, which are the result of the frictional stresses of strong winds blowing toward the land and causing the set level and the low atmospheric pressure at the centre of the cyclone can additionally raise the sea level. In addition to the rise in water level itself, another wave factor must be considered. A rise of mean sea level due to white-cap wave dissipation should be considered. In bounded bodies of water, such as small seas, wind driven sea level set up is much serious than inverted barometer effects, in which the effects of wind waves on wind-driven current play an important role. It is necessary to develop the coupled system of the full spectral third-generation wind-wave model (WAM or WAVEWATCH III), the meso-scale atmosphere model (MM5) and the coastal ocean model (POM) for simulating these physical interactions. As the component of coupled system is so heavy for personal usage, the parallel computing system should be developed. In this study, first, we developed the coupling system of the atmosphere model, ocean wave model and the coastal ocean model, in the Beowulf System, for the simulation of the storm surge. It was applied to the storm surge simulation caused by Typhoon Bart (T9918) in the Yatsushiro Sea. The atmosphere model and the ocean model have been made the parallel codes by SPMD methods. The wave-current interface model was developed by defining the wave breaking stresses. And we developed the coupling program to collect and distribute the exchanging data with the parallel system. Every models and coupler are executed at same time, and they calculate own jobs and pass data with organic system. MPMD method programming was performed to couple the models. The coupler and each models united by the separated group, and they calculated by the group unit. Also they passed message when exchanging data by global unit. The data are exchanged every 60-second model time that is the least common multiple time of the atmosphere model, the wave model and the ocean model. The model was applied to the storm surge simulation in the Yatsushiro Sea, in which we could not simulated the observed maximum surge height with the numerical model that did not include the wave breaking stress. It is confirmed that the simulation which includes the wave breaking stress effects can produce the observed maximum height, 450 cm, at Matsuai.
Parent-Child Parallel-Group Intervention for Childhood Aggression in Hong Kong

ERIC Educational Resources Information Center

Fung, Annis L. C.; Tsang, Sandra H. K. M.

2006-01-01

This article reports the original evidence-based outcome study on parent-child parallel group-designed Anger Coping Training (ACT) program for children aged 8-10 with reactive aggression and their parents in Hong Kong. This research program involved experimental and control groups with pre- and post-comparison. Quantitative data collection…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Parallel Performance of a Combustion Chemistry Simulation

DOE PAGES

Skinner, Gregg; Eigenmann, Rudolf

1995-01-01

We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

PubMed Central

Bajaj, Chandrajit

2009-01-01

Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Algorithms and programming tools for image processing on the MPP, part 2

NASA Technical Reports Server (NTRS)

Reeves, Anthony P.

1986-01-01

A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Scalable Unix commands for parallel processors : a high-performance implementation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, E.; Lusk, E.; Gropp, W.

2001-06-22

We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
PISCES 2 users manual

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.

1987-01-01

PISCES 2 is a programming environment and set of extensions to Fortran 77 for parallel programming. It is intended to provide a basis for writing programs for scientific and engineering applications on parallel computers in a way that is relatively independent of the particular details of the underlying computer architecture. This user's manual provides a complete description of the PISCES 2 system as it is currently implemented on the 20 processor Flexible FLEX/32 at NASA Langley Research Center.
A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Current correlations for the transport of interacting electrons through parallel quantum dots in a photon cavity

NASA Astrophysics Data System (ADS)

Gudmundsson, Vidar; Abdullah, Nzar Rauf; Sitek, Anna; Goan, Hsi-Sheng; Tang, Chi-Shung; Manolescu, Andrei

2018-06-01

We calculate the current correlations for the steady-state electron transport through multi-level parallel quantum dots embedded in a short quantum wire, that is placed in a non-perfect photon cavity. We account for the electron-electron Coulomb interaction, and the para- and diamagnetic electron-photon interactions with a stepwise scheme of configuration interactions and truncation of the many-body Fock spaces. In the spectral density of the temporal current-current correlations we identify all the transitions, radiative and non-radiative, active in the system in order to maintain the steady state. We observe strong signs of two types of Rabi oscillations.
A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation

NASA Astrophysics Data System (ADS)

Cho, Min Hyung; Cai, Wei

2010-12-01

A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.
Interactive Fringe Analysis System: Applications To Moire Contourogram And Interferogram

NASA Astrophysics Data System (ADS)

Yatagai, T.; Idesawa, M.; Yamaashi, Y.; Suzuki, M.

1982-10-01

A general purpose fringe pattern processing facility was developed in order to analyze moire photographs used for scoliosis diagnoses and interferometric patterns in optical shops. A TV camera reads a fringe profile to be analyzed, and peaks of the fringe are detected by a microcomputer. Fringe peak correction and fringe order determination are performed with the man-machine interactive software developed. A light pen facility and an image digitizer are employed for interaction. In the case of two-dimensional fringe analysis, we analyze independently analysis lines parallel to each other and a reference line perpendicular to the parallel analysis lines. Fringe orders of parallel analysis lines are uniquely determined by using the fringe order of the reference line. Some results of analysis of moire contourograms, interferometric testing of silicon wafers, and holographic measurement of thermal deformation are presented.

Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Large Scale Analysis of Geospatial Data with Dask and XArray

NASA Astrophysics Data System (ADS)

Zender, C. S.; Hamman, J.; Abernathey, R.; Evans, K. J.; Rocklin, M.; Zender, C. S.; Rocklin, M.

2017-12-01

The analysis of geospatial data with high level languages has acceleratedinnovation and the impact of existing data resources. However, as datasetsgrow beyond single-machine memory, data structures within these high levellanguages can become a bottleneck. New libraries like Dask and XArray resolve some of these scalability issues,providing interactive workflows that are both familiar tohigh-level-language researchers while also scaling out to much largerdatasets. This broadens the access of researchers to larger datasets on highperformance computers and, through interactive development, reducestime-to-insight when compared to traditional parallel programming techniques(MPI). This talk describes Dask, a distributed dynamic task scheduler, Dask.array, amulti-dimensional array that copies the popular NumPy interface, and XArray,a library that wraps NumPy/Dask.array with labeled and indexes axes,implementing the CF conventions. We discuss both the basic design of theselibraries and how they change interactive analysis of geospatial data, and alsorecent benefits and challenges of distributed computing on clusters ofmachines.
Abinitio powder x-ray diffraction and PIXEL energy calculations on thiophene derived 1,4 dihydropyridine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karthikeyan, N., E-mail: karthin10@gmail.com; Sivakumar, K.; Pachamuthu, M. P.

We focus on the application of powder diffraction data to get abinitio crystal structure determination of thiophene derived 1,4 DHP prepared by cyclocondensation method using solid catalyst. Crystal structure of the compound has been solved by direct-space approach on Monte Carlo search in parallel tempering mode using FOX program. Initial atomic coordinates were derived using Gaussian 09W quantum chemistry software in semi-empirical approach and Rietveld refinement was carried out using GSAS program. The crystal structure of the compound is stabilized by one N-H…O and three C-H…O hydrogen bonds. PIXEL lattice energy calculation was carried out to understand the physical naturemore » of intermolecular interactions in the crystal packing, on which the total lattice energy is contributed into Columbic, polarization, dispersion, and repulsion energies.« less
Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

NASA Astrophysics Data System (ADS)

Shen, Yanfeng; Cesnik, Carlos E. S.

2016-04-01

This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less
On the utility of threads for data parallel programming

NASA Technical Reports Server (NTRS)

Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

1995-01-01

Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

NASA Technical Reports Server (NTRS)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
A design methodology for portable software on parallel computers

NASA Technical Reports Server (NTRS)

Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

1993-01-01

This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.
Heterogeneous scalable framework for multiphase flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, Karla Vanessa

2013-09-01

Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

NASA Astrophysics Data System (ADS)

Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less

Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Visual analysis of inter-process communication for large-scale parallel computing.

PubMed

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.

PubMed

Jiang, Xiaohui; Kumar, Kamal; Hu, Xin; Wallqvist, Anders; Reifman, Jaques

2008-09-08

Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.
6(th) Annual Symposium on Self-Monitoring of Blood Glucose (SMBG) applications and beyond, April 25-27, 2013, Riga, Latvia.

PubMed

Alzaid, Aus; Schlaeger, Christof; Hinzmann, Rolf

2013-12-01

International experts in the fields of diabetes, diabetes technology, endocrinology, and pediatrics gathered for the 6(th) Annual Symposium on Self-Monitoring of Blood Glucose (SMBG) Applications and beyond. The aim of this meeting was to continue setting up a global network of experts in this field and provide an international platform for exchange of ideas to improve life for people with diabetes. The 2013 meeting comprised a comprehensive scientific program, parallel interactive workshops, and two keynote lectures. All these discussions were intended to help identify gaps and areas where further scientific work and clinical studies are warranted.
Transglutaminase induction by various cell death and apoptosis pathways.

PubMed

Fesus, L; Madi, A; Balajthy, Z; Nemes, Z; Szondy, Z

1996-10-31

Clarification of the molecular details of forms of natural cell death, including apoptosis, has become one of the most challenging issues of contemporary biomedical sciences. One of the effector elements of various cell death pathways is the covalent cross-linking of cellular proteins by transglutaminases. This review will discuss the accumulating data related to the induction and regulation of these enzymes, particularly of tissue type transglutaminase, in the molecular program of cell death. A wide range of signalling pathways can lead to the parallel induction of apoptosis and transglutaminase, providing a handle for better understanding the exact molecular interactions responsible for the mechanism of regulated cell death.
Parallel machine architecture and compiler design facilities

NASA Technical Reports Server (NTRS)

Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

1990-01-01

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Coding coarse grained polymer model for LAMMPS and its application to polymer crystallization

NASA Astrophysics Data System (ADS)

Luo, Chuanfu; Sommer, Jens-Uwe

2009-08-01

We present a patch code for LAMMPS to implement a coarse grained (CG) model of poly(vinyl alcohol) (PVA). LAMMPS is a powerful molecular dynamics (MD) simulator developed at Sandia National Laboratories. Our patch code implements tabulated angular potential and Lennard-Jones-9-6 (LJ96) style interaction for PVA. Benefited from the excellent parallel efficiency of LAMMPS, our patch code is suitable for large-scale simulations. This CG-PVA code is used to study polymer crystallization, which is a long-standing unsolved problem in polymer physics. By using parallel computing, cooling and heating processes for long chains are simulated. The results show that chain-folded structures resembling the lamellae of polymer crystals are formed during the cooling process. The evolution of the static structure factor during the crystallization transition indicates that long-range density order appears before local crystalline packing. This is consistent with some experimental observations by small/wide angle X-ray scattering (SAXS/WAXS). During the heating process, it is found that the crystalline regions are still growing until they are fully melted, which can be confirmed by the evolution both of the static structure factor and average stem length formed by the chains. This two-stage behavior indicates that melting of polymer crystals is far from thermodynamic equilibrium. Our results concur with various experiments. It is the first time that such growth/reorganization behavior is clearly observed by MD simulations. Our code can be easily used to model other type of polymers by providing a file containing the tabulated angle potential data and a set of appropriate parameters. Program summaryProgram title: lammps-cgpva Catalogue identifier: AEDE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDE_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU's GPL No. of lines in distributed program, including test data, etc.: 940 798 No. of bytes in distributed program, including test data, etc.: 12 536 245 Distribution format: tar.gz Programming language: C++/MPI Computer: Tested on Intel-x86 and AMD64 architectures. Should run on any architecture providing a C++ compiler Operating system: Tested under Linux. Any other OS with C++ compiler and MPI library should suffice Has the code been vectorized or parallelized?: Yes RAM: Depends on system size and how many CPUs are used Classification: 7.7 External routines: LAMMPS ( http://lammps.sandia.gov/), FFTW ( http://www.fftw.org/) Nature of problem: Implementing special tabular angle potentials and Lennard-Jones-9-6 style interactions of a coarse grained polymer model for LAMMPS code. Solution method: Cubic spline interpolation of input tabulated angle potential data. Restrictions: The code is based on a former version of LAMMPS. Unusual features.: Any special angular potential can be used if it can be tabulated. Running time: Seconds to weeks, depending on system size, speed of CPU and how many CPUs are used. The test run provided with the package takes about 5 minutes on 4 AMD's opteron (2.6 GHz) CPUs. References:D. Reith, H. Meyer, F. Müller-Plathe, Macromolecules 34 (2001) 2335-2345. H. Meyer, F. Müller-Plathe, J. Chem. Phys. 115 (2001) 7807. H. Meyer, F. Müller-Plathe, Macromolecules 35 (2002) 1241-1252.
DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dr. Dale M. Snider

2011-02-28

This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs

NASA Technical Reports Server (NTRS)

Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)

1996-01-01

Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Command/response protocols and concurrent software

NASA Technical Reports Server (NTRS)

Bynum, W. L.

1987-01-01

A version of the program to control the parallel jaw gripper is documented. The parallel jaw end-effector hardware and the Intel 8031 processor that is used to control the end-effector are briefly described. A general overview of the controller program is given and a complete description of the program's structure and design are contained. There are three appendices: a memory map of the on-chip RAM, a cross-reference listing of the self-scheduling routines, and a summary of the top-level and monitor commands.
Interaction of upgoing auroral H(+) and O(+) beams

NASA Technical Reports Server (NTRS)

Kaufmann, R. L.; Ludlow, G. R.; Collin, H. L.; Peterson, W. K.; Burch, J. L.

1986-01-01

Data from the S3-3 and DE 1 satellites are analyzed to study the interaction between H(+) and O(+) ions in upgoing auroral beams. Every data set analyzed showed some evidence of an interaction. The measured plasma was found to be unstable to a low-frequency electrostatic wave that propagates at an oblique angle to vector-B(0). A second wave, which can propagate parallel to vector-B(0), is weakly damped in the plasma studied in most detail. It is likely that the upgoing ion beams generate this parallel wave at lower altitudes. The resulting wave-particle interactions qualitatively can explain most of the features observed in ion distribution functions.
Computer programs for adjusting the mechanical properties of 2-inch dimension lumber for changes in moisture content

Treesearch

James W. Evans; Jane K. Evans; David W. Green

1990-01-01

This paper presents computer programs for adjusting the mechanical properties of 2-in. dimension lumber for changes in moisture content. Mechanical properties adjusted are modulus of rupture, ultimate tensile stress parallel to the grain, ultimate compressive stress parallel to the gain, and flexural modulus of elasticity. The models are valid for moisture contents...
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Aggregation and Gelation of Aromatic Polyamides with Parallel and Anti-parallel Alignment of Molecular Dipole Along the Backbone

NASA Astrophysics Data System (ADS)

Zhu, Dan; Shang, Jing; Ye, Xiaodong; Shen, Jian

2016-12-01

The understanding of macromolecular structures and interactions is important but difficult, due to the facts that a macromolecules are of versatile conformations and aggregate states, which vary with environmental conditions and histories. In this work two polyamides with parallel or anti-parallel dipoles along the linear backbone, named as ABAB (parallel) and AABB (anti-parallel) have been studied. By using a combination of methods, the phase behaviors of the polymers during the aggregate and gelation, i.e., the forming or dissociation processes of nuclei and fibril, cluster of fibrils, and cluster-cluster aggregation have been revealed. Such abundant phase behaviors are dominated by the inter-chain interactions, including dispersion, polarity and hydrogen bonding, and correlatd with the solubility parameters of solvents, the temperature, and the polymer concentration. The results of X-ray diffraction and fast-mode dielectric relaxation indicate that AABB possesses more rigid conformation than ABAB, and because of that AABB aggregates are of long fibers while ABAB is of hairy fibril clusters, the gelation concentration in toluene is 1 w/v% for AABB, lower than the 3 w/v% for ABAB.
Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis.. (VII) HFODD (v2.49t): A new version of the program

NASA Astrophysics Data System (ADS)

Schunck, N.; Dobaczewski, J.; McDonnell, J.; Satuła, W.; Sheikh, J. A.; Staszczak, A.; Stoitsov, M.; Toivanen, P.

2012-01-01

We describe the new version (v2.49t) of the code HFODD which solves the nuclear Skyrme-Hartree-Fock (HF) or Skyrme-Hartree-Fock-Bogolyubov (HFB) problem by using the Cartesian deformed harmonic-oscillator basis. In the new version, we have implemented the following physics features: (i) the isospin mixing and projection, (ii) the finite-temperature formalism for the HFB and HF + BCS methods, (iii) the Lipkin translational energy correction method, (iv) the calculation of the shell correction. A number of specific numerical methods have also been implemented in order to deal with large-scale multi-constraint calculations and hardware limitations: (i) the two-basis method for the HFB method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint calculations, (iii) the linear constraint method based on the approximation of the RPA matrix for multi-constraint calculations, (iv) an interface with the axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or HFB matrix elements instead of the HF fields. Special care has been paid to using the code on massively parallel leadership class computers. For this purpose, the following features are now available with this version: (i) the Message Passing Interface (MPI) framework, (ii) scalable input data routines, (iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the HFB matrix in the simplex-breaking case using the ScaLAPACK library. Finally, several little significant errors of the previous published version were corrected. New version program summaryProgram title:HFODD (v2.49t) Catalogue identifier: ADFL_v3_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADFL_v3_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public Licence v3 No. of lines in distributed program, including test data, etc.: 190 614 No. of bytes in distributed program, including test data, etc.: 985 898 Distribution format: tar.gz Programming language: FORTRAN-90 Computer: Intel Pentium-III, Intel Xeon, AMD-Athlon, AMD-Opteron, Cray XT4, Cray XT5 Operating system: UNIX, LINUX, Windows XP Has the code been vectorized or parallelized?: Yes, parallelized using MPI RAM: 10 Mwords Word size: The code is written in single-precision for the use on a 64-bit processor. The compiler option -r8 or +autodblpad (or equivalent) has to be used to promote all real and complex single-precision floating-point items to double precision when the code is used on a 32-bit machine. Classification: 17.22 Catalogue identifier of previous version: ADFL_v2_2 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2361 External routines: The user must have access to the NAGLIB subroutine f02axe, or LAPACK subroutines zhpev, zhpevx, zheevr, or zheevd, which diagonalize complex hermitian matrices, the LAPACK subroutines dgetri and dgetrf which invert arbitrary real matrices, the LAPACK subroutines dsyevd, dsytrf and dsytri which compute eigenvalues and eigenfunctions of real symmetric matrices, the LINPACK subroutines zgedi and zgeco, which invert arbitrary complex matrices and calculate determinants, the BLAS routines dcopy, dscal, dgeem and dgemv for double-precision linear algebra and zcopy, zdscal, zgeem and zgemv for complex linear algebra, or provide another set of subroutines that can perform such tasks. The BLAS and LAPACK subroutines can be obtained from the Netlib Repository at the University of Tennessee, Knoxville: http://netlib2.cs.utk.edu/. Does the new version supersede the previous version?: Yes Nature of problem: The nuclear mean field and an analysis of its symmetries in realistic cases are the main ingredients of a description of nuclear states. Within the Local Density Approximation, or for a zero-range velocity-dependent Skyrme interaction, the nuclear mean field is local and velocity dependent. The locality allows for an effective and fast solution of the self-consistent Hartree-Fock equations, even for heavy nuclei, and for various nucleonic ( n-particle- n-hole) configurations, deformations, excitation energies, or angular momenta. Similarly, Local Density Approximation in the particle-particle channel, which is equivalent to using a zero-range interaction, allows for a simple implementation of pairing effects within the Hartree-Fock-Bogolyubov method. Solution method: The program uses the Cartesian harmonic oscillator basis to expand single-particle or single-quasiparticle wave functions of neutrons and protons interacting by means of the Skyrme effective interaction and zero-range pairing interaction. The expansion coefficients are determined by the iterative diagonalization of the mean-field Hamiltonians or Routhians which depend non-linearly on the local neutron and proton densities. Suitable constraints are used to obtain states corresponding to a given configuration, deformation or angular momentum. The method of solution has been presented in: [J. Dobaczewski, J. Dudek, Comput. Phys. Commun. 102 (1997) 166]. Reasons for new version: Version 2.49s of HFODD provides a number of new options such as the isospin mixing and projection of the Skyrme functional, the finite-temperature HF and HFB formalism and optimized methods to perform multi-constrained calculations. It is also the first version of HFODD to contain threading and parallel capabilities. Summary of revisions: Isospin mixing and projection of the HF states has been implemented. The finite-temperature formalism for the HFB equations has been implemented. The Lipkin translational energy correction method has been implemented. Calculation of the shell correction has been implemented. The two-basis method for the solution to the HFB equations has been implemented. The Augmented Lagrangian Method (ALM) for calculations with multiple constraints has been implemented. The linear constraint method based on the cranking approximation of the RPA matrix has been implemented. An interface between HFODD and the axially-symmetric and parity-conserving code HFBTHO has been implemented. The mixing of the matrix elements of the HF or HFB matrix has been implemented. A parallel interface using the MPI library has been implemented. A scalable model for reading input data has been implemented. OpenMP pragmas have been implemented in three subroutines. The diagonalization of the HFB matrix in the simplex-breaking case has been parallelized using the ScaLAPACK library. Several little significant errors of the previous published version were corrected. Running time: In serial mode, running 6 HFB iterations for 152Dy for conserved parity and signature symmetries in a full spherical basis of N=14 shells takes approximately 8 min on an AMD Opteron processor at 2.6 GHz, assuming standard BLAS and LAPACK libraries. As a rule of thumb, runtime for HFB calculations for parity and signature conserved symmetries roughly increases as N, where N is the number of full HO shells. Using custom-built optimized BLAS and LAPACK libraries (such as in the ATLAS implementation) can bring down the execution time by 60%. Using the threaded version of the code with 12 threads and threaded BLAS libraries can bring an additional factor 2 speed-up, so that the same 6 HFB iterations now take of the order of 2 min 30 s.
Pediatric trainees' engagement in the online nutrition curriculum: preliminary results.

PubMed

Lewis, Kadriye O; Frank, Graeme R; Nagel, Rollin; Turner, Teri L; Ferrell, Cynthia L; Sangvai, Shilpa G; Donthi, Rajesh; Mahan, John D

2014-09-16

The Pediatric Nutrition Series (PNS) consists of ten online, interactive modules and supplementary educational materials that have utilized web-based multimedia technologies to offer nutrition education for pediatric trainees and practicing physicians. The purpose of the study was to evaluate pediatric trainees' engagement, knowledge acquisition, and satisfaction with nutrition modules delivered online in interactive and non-interactive formats. From December 2010 through August 2011, pediatric trainees from seventy-three (73) different U.S. programs completed online nutrition modules designed to develop residents' knowledge of counseling around and management of nutritional issues in children. Data were analyzed using SPSS version 19. Both descriptive and inferential statistics were used in comparing interactive versus non-interactive modules. Pretest/posttest and module evaluations measured knowledge acquisition and satisfaction. Three hundred and twenty-two (322) pediatric trainees completed one or more of six modules for a total of four hundred and forty-two (442) accessions. All trainees who completed at least one module were included in the study. Two-way analyses of variance (ANOVA) with repeated measures (pre/posttest by interactive/non-interactive format) indicated significant knowledge gains from pretest to posttest (p < 0.002 for all six modules). Comparisons between interactive and non-interactive formats for Module 1 (N = 85 interactive, N = 95 non-interactive) and Module 5 (N = 5 interactive, N = 16 non-interactive) indicated a parallel improvement from the pretest to posttest, with the interactive format significantly higher than the non-interactive modules (p < .05). Both qualitative and quantitative data from module evaluations demonstrated that satisfaction with modules was high. However, there were lower ratings for whether learning objectives were met with Module 6 (p < 0.03) and lecturer rating (p < 0.004) compared to Module 1. Qualitative data also showed that completion of the interactive modules resulted in higher resident satisfaction. This initial assessment of the PNS modules shows that technology-mediated delivery of a nutrition curriculum in residency programs has great potential for providing rich learning environments for trainees while maintaining a high level of participant satisfaction.
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Petascale Simulation Initiative Tech Base: FY2007 Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

May, J; Chen, R; Jefferson, D

The Petascale Simulation Initiative began as an LDRD project in the middle of Fiscal Year 2004. The goal of the project was to develop techniques to allow large-scale scientific simulation applications to better exploit the massive parallelism that will come with computers running at petaflops per second. One of the major products of this work was the design and prototype implementation of a programming model and a runtime system that lets applications extend data-parallel applications to use task parallelism. By adopting task parallelism, applications can use processing resources more flexibly, exploit multiple forms of parallelism, and support more sophisticated multiscalemore » and multiphysics models. Our programming model was originally called the Symponents Architecture but is now known as Cooperative Parallelism, and the runtime software that supports it is called Coop. (However, we sometimes refer to the programming model as Coop for brevity.) We have documented the programming model and runtime system in a submitted conference paper [1]. This report focuses on the specific accomplishments of the Cooperative Parallelism project (as we now call it) under Tech Base funding in FY2007. Development and implementation of the model under LDRD funding alone proceeded to the point of demonstrating a large-scale materials modeling application using Coop on more than 1300 processors by the end of FY2006. Beginning in FY2007, the project received funding from both LDRD and the Computation Directorate Tech Base program. Later in the year, after the three-year term of the LDRD funding ended, the ASC program supported the project with additional funds. The goal of the Tech Base effort was to bring Coop from a prototype to a production-ready system that a variety of LLNL users could work with. Specifically, the major tasks that we planned for the project were: (1) Port SARS [former name of the Coop runtime system] to another LLNL platform, probably Thunder or Peloton (depending on when Peloton becomes available); (2) Improve SARS's robustness and ease-of-use, and develop user documentation; and (3) Work with LLNL code teams to help them determine how Symponents could benefit their applications. The original funding request was $296,000 for the year, and we eventually received $252,000. The remainder of this report describes our efforts and accomplishments for each of the goals listed above.« less

Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
MOLNs: A CLOUD PLATFORM FOR INTERACTIVE, REPRODUCIBLE, AND SCALABLE SPATIAL STOCHASTIC COMPUTATIONAL EXPERIMENTS IN SYSTEMS BIOLOGY USING PyURDME.

PubMed

Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas

2016-01-01

Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
Declarative language design for interactive visualization.

PubMed

Heer, Jeffrey; Bostock, Michael

2010-01-01

We investigate the design of declarative, domain-specific languages for constructing interactive visualizations. By separating specification from execution, declarative languages can simplify development, enable unobtrusive optimization, and support retargeting across platforms. We describe the design of the Protovis specification language and its implementation within an object-oriented, statically-typed programming language (Java). We demonstrate how to support rich visualizations without requiring a toolkit-specific data model and extend Protovis to enable declarative specification of animated transitions. To support cross-platform deployment, we introduce rendering and event-handling infrastructures decoupled from the runtime platform, letting designers retarget visualization specifications (e.g., from desktop to mobile phone) with reduced effort. We also explore optimizations such as runtime compilation of visualization specifications, parallelized execution, and hardware-accelerated rendering. We present benchmark studies measuring the performance gains provided by these optimizations and compare performance to existing Java-based visualization tools, demonstrating scalability improvements exceeding an order of magnitude.
Stochastic optimization of GeantV code by use of genetic algorithms

DOE PAGES

Amadio, G.; Apostolakis, J.; Bandieramonte, M.; ...

2017-10-01

GeantV is a complex system based on the interaction of different modules needed for detector simulation, which include transport of particles in fields, physics models simulating their interactions with matter and a geometrical modeler library for describing the detector and locating the particles and computing the path length to the current volume boundary. The GeantV project is recasting the classical simulation approach to get maximum benefit from SIMD/MIMD computational architectures and highly massive parallel systems. This involves finding the appropriate balance between several aspects influencing computational performance (floating-point performance, usage of off-chip memory bandwidth, specification of cache hierarchy, etc.) andmore » handling a large number of program parameters that have to be optimized to achieve the best simulation throughput. This optimization task can be treated as a black-box optimization problem, which requires searching the optimum set of parameters using only point-wise function evaluations. Here, the goal of this study is to provide a mechanism for optimizing complex systems (high energy physics particle transport simulations) with the help of genetic algorithms and evolution strategies as tuning procedures for massive parallel simulations. One of the described approaches is based on introducing a specific multivariate analysis operator that could be used in case of resource expensive or time consuming evaluations of fitness functions, in order to speed-up the convergence of the black-box optimization problem.« less
Stochastic optimization of GeantV code by use of genetic algorithms

NASA Astrophysics Data System (ADS)

Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Behera, S. P.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Hariri, F.; Jun, S. Y.; Konstantinov, D.; Kumawat, H.; Ivantchenko, V.; Lima, G.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.

2017-10-01

GeantV is a complex system based on the interaction of different modules needed for detector simulation, which include transport of particles in fields, physics models simulating their interactions with matter and a geometrical modeler library for describing the detector and locating the particles and computing the path length to the current volume boundary. The GeantV project is recasting the classical simulation approach to get maximum benefit from SIMD/MIMD computational architectures and highly massive parallel systems. This involves finding the appropriate balance between several aspects influencing computational performance (floating-point performance, usage of off-chip memory bandwidth, specification of cache hierarchy, etc.) and handling a large number of program parameters that have to be optimized to achieve the best simulation throughput. This optimization task can be treated as a black-box optimization problem, which requires searching the optimum set of parameters using only point-wise function evaluations. The goal of this study is to provide a mechanism for optimizing complex systems (high energy physics particle transport simulations) with the help of genetic algorithms and evolution strategies as tuning procedures for massive parallel simulations. One of the described approaches is based on introducing a specific multivariate analysis operator that could be used in case of resource expensive or time consuming evaluations of fitness functions, in order to speed-up the convergence of the black-box optimization problem.
Stochastic optimization of GeantV code by use of genetic algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; Apostolakis, J.; Bandieramonte, M.

GeantV is a complex system based on the interaction of different modules needed for detector simulation, which include transport of particles in fields, physics models simulating their interactions with matter and a geometrical modeler library for describing the detector and locating the particles and computing the path length to the current volume boundary. The GeantV project is recasting the classical simulation approach to get maximum benefit from SIMD/MIMD computational architectures and highly massive parallel systems. This involves finding the appropriate balance between several aspects influencing computational performance (floating-point performance, usage of off-chip memory bandwidth, specification of cache hierarchy, etc.) andmore » handling a large number of program parameters that have to be optimized to achieve the best simulation throughput. This optimization task can be treated as a black-box optimization problem, which requires searching the optimum set of parameters using only point-wise function evaluations. Here, the goal of this study is to provide a mechanism for optimizing complex systems (high energy physics particle transport simulations) with the help of genetic algorithms and evolution strategies as tuning procedures for massive parallel simulations. One of the described approaches is based on introducing a specific multivariate analysis operator that could be used in case of resource expensive or time consuming evaluations of fitness functions, in order to speed-up the convergence of the black-box optimization problem.« less
Velocity statistics for interacting edge dislocations in one dimension from Dyson's Coulomb gas model.

PubMed

Jafarpour, Farshid; Angheluta, Luiza; Goldenfeld, Nigel

2013-10-01

The dynamics of edge dislocations with parallel Burgers vectors, moving in the same slip plane, is mapped onto Dyson's model of a two-dimensional Coulomb gas confined in one dimension. We show that the tail distribution of the velocity of dislocations is power law in form, as a consequence of the pair interaction of nearest neighbors in one dimension. In two dimensions, we show the presence of a pairing phase transition in a system of interacting dislocations with parallel Burgers vectors. The scaling exponent of the velocity distribution at effective temperatures well below this pairing transition temperature can be derived from the nearest-neighbor interaction, while near the transition temperature, the distribution deviates from the form predicted by the nearest-neighbor interaction, suggesting the presence of collective effects.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Mechanical Behavior of Collagen-Fibrin Co-Gels Reflects Transition From Series to Parallel Interactions With Increasing Collagen Content

PubMed Central

Lai, Victor K.; Lake, Spencer P.; Frey, Christina R.; Tranquillo, Robert T.; Barocas, Victor H.

2012-01-01

Fibrin and collagen, biopolymers occurring naturally in the body, are biomaterials commonly-used as scaffolds for tissue engineering. How collagen and ﬁbrin interact to confer macroscopic mechanical properties in collagen-ﬁbrin composite systems remains poorly understood. In this study, we formulated collagen-ﬁbrin co-gels at different collagen-toﬁbrin ratios to observe changes in the overall mechanical behavior and microstructure. A modeling framework of a two-network system was developed by modifying our micro-scale model, considering two forms of interaction between the networks: (a) two interpenetrating but noninteracting networks (“parallel”), and (b) a single network consisting of randomly alternating collagen and ﬁbrin ﬁbrils (“series”). Mechanical testing of our gels show that collagen-ﬁbrin co-gels exhibit intermediate properties (UTS, strain at failure, tangent modulus) compared to those of pure collagen and ﬁbrin. The comparison with model predictions show that the parallel and series model cases provide upper and lower bounds, respectively, for the experimental data, suggesting that a combination of such interactions exists between the collagen and ﬁbrin in co-gels. A transition from the series model to the parallel model occurs with increasing collagen content, with the series model best describing predominantly ﬁbrin co-gels, and the parallel model best describing predominantly collagen co-gels. PMID:22482659
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Automatic recognition of vector and parallel operations in a higher level language

NASA Technical Reports Server (NTRS)

Schneck, P. B.

1971-01-01

A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
Swellix: a computational tool to explore RNA conformational space.

PubMed

Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J

2017-11-21

The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1996-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
The Intermediate Neutrino Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, C.; Alonso, J. R.; Ankowski, A. M.

2017-04-03

The US neutrino community gathered at the Workshop on the Intermediate Neutrino Program (WINP) at Brookhaven National Laboratory February 4-6, 2015 to explore opportunities in neutrino physics over the next five to ten years. Scientists from particle, astroparticle and nuclear physics participated in the workshop. The workshop examined promising opportunities for neutrino physics in the intermediate term, including possible new small to mid-scale experiments, US contributions to large experiments, upgrades to existing experiments, R&D plans and theory. The workshop was organized into two sets of parallel working group sessions, divided by physics topics and technology. Physics working groups covered topicsmore » on Sterile Neutrinos, Neutrino Mixing, Neutrino Interactions, Neutrino Properties and Astrophysical Neutrinos. Technology sessions were organized into Theory, Short-Baseline Accelerator Neutrinos, Reactor Neutrinos, Detector R&D and Source, Cyclotron and Meson Decay at Rest sessions.This report summarizes discussion and conclusions from the workshop.« less
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE PAGES

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...

2017-03-08

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less

Real-Time MENTAT programming language and architecture

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

1989-01-01

Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
Rule-based programming paradigm: a formal basis for biological, chemical and physical computation.

PubMed

Krishnamurthy, V; Krishnamurthy, E V

1999-03-01

A rule-based programming paradigm is described as a formal basis for biological, chemical and physical computations. In this paradigm, the computations are interpreted as the outcome arising out of interaction of elements in an object space. The interactions can create new elements (or same elements with modified attributes) or annihilate old elements according to specific rules. Since the interaction rules are inherently parallel, any number of actions can be performed cooperatively or competitively among the subsets of elements, so that the elements evolve toward an equilibrium or unstable or chaotic state. Such an evolution may retain certain invariant properties of the attributes of the elements. The object space resembles Gibbsian ensemble that corresponds to a distribution of points in the space of positions and momenta (called phase space). It permits the introduction of probabilities in rule applications. As each element of the ensemble changes over time, its phase point is carried into a new phase point. The evolution of this probability cloud in phase space corresponds to a distributed probabilistic computation. Thus, this paradigm can handle tor deterministic exact computation when the initial conditions are exactly specified and the trajectory of evolution is deterministic. Also, it can handle probabilistic mode of computation if we want to derive macroscopic or bulk properties of matter. We also explain how to support this rule-based paradigm using relational-database like query processing and transactions.
g_contacts: Fast contact search in bio-molecular ensemble data

NASA Astrophysics Data System (ADS)

Blau, Christian; Grubmuller, Helmut

2013-12-01

Short-range interatomic interactions govern many bio-molecular processes. Therefore, identifying close interaction partners in ensemble data is an essential task in structural biology and computational biophysics. A contact search can be cast as a typical range search problem for which efficient algorithms have been developed. However, none of those has yet been adapted to the context of macromolecular ensembles, particularly in a molecular dynamics (MD) framework. Here a set-decomposition algorithm is implemented which detects all contacting atoms or residues in maximum O(Nlog(N)) run-time, in contrast to the O(N2) complexity of a brute-force approach. Catalogue identifier: AEQA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQA_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 8945 No. of bytes in distributed program, including test data, etc.: 981604 Distribution format: tar.gz Programming language: C99. Computer: PC. Operating system: Linux. RAM: ≈Size of input frame Classification: 3, 4.14. External routines: Gromacs 4.6[1] Nature of problem: Finding atoms or residues that are closer to one another than a given cut-off. Solution method: Excluding distant atoms from distance calculations by decomposing the given set of atoms into disjoint subsets. Running time:≤O(Nlog(N)) References: [1] S. Pronk, S. Pall, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J.C. Smith, P. M. Kasson, D. van der Spoel, B. Hess and Erik Lindahl, Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics 29 (7) (2013).
Multiprogramming performance degradation - Case study on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Dimpsey, R. T.; Iyer, R. K.

1989-01-01

The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

PubMed

O'Connor, B P

2000-08-01

Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
Kinetic theory of turbulence for parallel propagation revisited: Formal results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoon, Peter H., E-mail: yoonp@umd.edu

2015-08-15

In a recent paper, Gaelzer et al. [Phys. Plasmas 22, 032310 (2015)] revisited the second-order nonlinear kinetic theory for turbulence propagating in directions parallel/anti-parallel to the ambient magnetic field. The original work was according to Yoon and Fang [Phys. Plasmas 15, 122312 (2008)], but Gaelzer et al. noted that the terms pertaining to discrete-particle effects in Yoon and Fang's theory did not enjoy proper dimensionality. The purpose of Gaelzer et al. was to restore the dimensional consistency associated with such terms. However, Gaelzer et al. was concerned only with linear wave-particle interaction terms. The present paper completes the analysis bymore » considering the dimensional correction to nonlinear wave-particle interaction terms in the wave kinetic equation.« less
Parallel multireference configuration interaction calculations on mini-β-carotenes and β-carotene

NASA Astrophysics Data System (ADS)

Kleinschmidt, Martin; Marian, Christel M.; Waletzke, Mirko; Grimme, Stefan

2009-01-01

We present a parallelized version of a direct selecting multireference configuration interaction (MRCI) code [S. Grimme and M. Waletzke, J. Chem. Phys. 111, 5645 (1999)]. The program can be run either in ab initio mode or as semiempirical procedure combined with density functional theory (DFT/MRCI). We have investigated the efficiency of the parallelization in case studies on carotenoids and porphyrins. The performance is found to depend heavily on the cluster architecture. While the speed-up on the older Intel Netburst technology is close to linear for up to 12-16 processes, our results indicate that it is not favorable to use all cores of modern Intel Dual Core or Quad Core processors simultaneously for memory intensive tasks. Due to saturation of the memory bandwidth, we recommend to run less demanding tasks on the latter architectures in parallel to two (Dual Core) or four (Quad Core) MRCI processes per node. The DFT/MRCI branch has been employed to study the low-lying singlet and triplet states of mini-n-β-carotenes (n =3, 5, 7, 9) and β-carotene (n =11) at the geometries of the ground state, the first excited triplet state, and the optically bright singlet state. The order of states depends heavily on the conjugation length and the nuclear geometry. The B1u+ state constitutes the S1 state in the vertical absorption spectrum of mini-3-β-carotene but switches order with the 2 A1g- state upon excited state relaxation. In the longer carotenes, near degeneracy or even root flipping between the B1u+ and B1u- states is observed whereas the 3 A1g- state is found to remain energetically above the optically bright B1u+ state at all nuclear geometries investigated here. The DFT/MRCI method is seen to underestimate the absolute excitation energies of the longer mini-β-carotenes but the energy gaps between the excited states are reproduced well. In addition to singlet data, triplet-triplet absorption energies are presented. For β-carotene, where these transition energies are known from experiment, excellent agreement with our calculations is observed.
Parallel multireference configuration interaction calculations on mini-beta-carotenes and beta-carotene.

PubMed

Kleinschmidt, Martin; Marian, Christel M; Waletzke, Mirko; Grimme, Stefan

2009-01-28

We present a parallelized version of a direct selecting multireference configuration interaction (MRCI) code [S. Grimme and M. Waletzke, J. Chem. Phys. 111, 5645 (1999)]. The program can be run either in ab initio mode or as semiempirical procedure combined with density functional theory (DFT/MRCI). We have investigated the efficiency of the parallelization in case studies on carotenoids and porphyrins. The performance is found to depend heavily on the cluster architecture. While the speed-up on the older Intel Netburst technology is close to linear for up to 12-16 processes, our results indicate that it is not favorable to use all cores of modern Intel Dual Core or Quad Core processors simultaneously for memory intensive tasks. Due to saturation of the memory bandwidth, we recommend to run less demanding tasks on the latter architectures in parallel to two (Dual Core) or four (Quad Core) MRCI processes per node. The DFT/MRCI branch has been employed to study the low-lying singlet and triplet states of mini-n-beta-carotenes (n=3, 5, 7, 9) and beta-carotene (n=11) at the geometries of the ground state, the first excited triplet state, and the optically bright singlet state. The order of states depends heavily on the conjugation length and the nuclear geometry. The (1)B(u) (+) state constitutes the S(1) state in the vertical absorption spectrum of mini-3-beta-carotene but switches order with the 2 (1)A(g) (-) state upon excited state relaxation. In the longer carotenes, near degeneracy or even root flipping between the (1)B(u) (+) and (1)B(u) (-) states is observed whereas the 3 (1)A(g) (-) state is found to remain energetically above the optically bright (1)B(u) (+) state at all nuclear geometries investigated here. The DFT/MRCI method is seen to underestimate the absolute excitation energies of the longer mini-beta-carotenes but the energy gaps between the excited states are reproduced well. In addition to singlet data, triplet-triplet absorption energies are presented. For beta-carotene, where these transition energies are known from experiment, excellent agreement with our calculations is observed.
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
The revised solar array synthesis computer program

NASA Technical Reports Server (NTRS)

1970-01-01

The Revised Solar Array Synthesis Computer Program is described. It is a general-purpose program which computes solar array output characteristics while accounting for the effects of temperature, incidence angle, charged-particle irradiation, and other degradation effects on various solar array configurations in either circular or elliptical orbits. Array configurations may consist of up to 75 solar cell panels arranged in any series-parallel combination not exceeding three series-connected panels in a parallel string and no more than 25 parallel strings in an array. Up to 100 separate solar array current-voltage characteristics, corresponding to 100 equal-time increments during the sunlight illuminated portion of an orbit or any 100 user-specified combinations of incidence angle and temperature, can be computed and printed out during one complete computer execution. Individual panel incidence angles may be computed and printed out at the user's option.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Boys in India challenge gender stereotypes.

PubMed

1998-01-01

This article discusses CEDPA's Better Life Options Program in India. The program was initiated in 1987 to challenge gender inequities. The program offers over 400,000 girls a chance to develop skills and self-confidence for increasing their options in education, social mobility, work, health, and family and community roles. CEDPA's partner, Prerana, offers an integrated program that provides literacy training, vocational skills, after-school tutoring, health education, and family life education for about 600 girls/year. Vocational training includes nontraditional skills, such as video production and electronics. Prerana established a parallel program in 1994 for boys and young men that aims to change attitudes about girls and women and traditional gender roles. The program offers vocational skills, such as cooking and candle-making. Family life education teaches gender awareness and provides counseling and services for reproductive health. The Prerana program emphasizes men's shared responsibility in parenthood and sexual behavior, shared contribution to family income, health and nutrition, and prevention of violence against women. Since 1994, the program has included 1200 boys in 6 villages in New Delhi. Boys' enrollment is increasing; several young men have volunteered to become depot holders of contraceptive supplies in their villages. For example, one young man who was part of the Prerana program went on to be a depot holder and then a family planning promoter and counselor. He interacts with both young and older men. His contributions were well received by his village.
Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
USRA/RIACS

NASA Technical Reports Server (NTRS)

Oliger, Joseph

1992-01-01

The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities with research programs in the aerospace sciences, under a cooperative agreement with NASA. The primary mission of RIACS is to provide research and expertise in computer science and scientific computing to support the scientific missions of NASA ARC. The research carried out at RIACS must change its emphasis from year to year in response to NASA ARC's changing needs and technological opportunities. A flexible scientific staff is provided through a university faculty visitor program, a post doctoral program, and a student visitor program. Not only does this provide appropriate expertise but it also introduces scientists outside of NASA to NASA problems. A small group of core RIACS staff provides continuity and interacts with an ARC technical monitor and scientific advisory group to determine the RIACS mission. RIACS activities are reviewed and monitored by a USRA advisory council and ARC technical monitor. Research at RIACS is currently being done in the following areas: (1) parallel computing; (2) advanced methods for scientific computing; (3) learning systems; (4) high performance networks and technology; and (5) graphics, visualization, and virtual environments. In the past year, parallel compiler techniques and adaptive numerical methods for flows in complicated geometries were identified as important problems to investigate for ARC's involvement in the Computational Grand Challenges of the next decade. We concluded a summer student visitors program during this six months. We had six visiting graduate students that worked on projects over the summer and presented seminars on their work at the conclusion of their visits. RIACS technical reports are usually preprints of manuscripts that have been submitted to research journals or conference proceedings. A list of these reports for the period July 1, 1992 through December 31, 1992 is provided.
Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

PubMed

Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

2016-08-05

The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

Automated Performance Prediction of Message-Passing Parallel Programs

NASA Technical Reports Server (NTRS)

Block, Robert J.; Sarukkai, Sekhar; Mehra, Pankaj; Woodrow, Thomas S. (Technical Monitor)

1995-01-01

The increasing use of massively parallel supercomputers to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. Much work has been done to create simple models that represent important characteristics of parallel programs, such as latency, network contention, and communication volume. But many of these methods still require substantial manual effort to represent an application in the model's format. The NIK toolkit described in this paper is the result of an on-going effort to automate the formation of analytic expressions of program execution time, with a minimum of programmer assistance. In this paper we demonstrate the feasibility of our approach, by extending previous work to detect and model communication patterns automatically, with and without overlapped computations. The predictions derived from these models agree, within reasonable limits, with execution times of programs measured on the Intel iPSC/860 and Paragon. Further, we demonstrate the use of MK in selecting optimal computational grain size and studying various scalability metrics.
Investigation on the individual contributions of N-H...O=C and C-H...O=C interactions to the binding energies of beta-sheet models.

PubMed

Wang, Chang-Sheng; Sun, Chang-Liang

2010-04-15

In this article, the binding energies of 16 antiparallel and parallel beta-sheet models are estimated using the analytic potential energy function we proposed recently and the results are compared with those obtained from MP2, AMBER99, OPLSAA/L, and CHARMM27 calculations. The comparisons indicate that the analytic potential energy function can produce reasonable binding energies for beta-sheet models. Further comparisons suggest that the binding energy of the beta-sheet models might come mainly from dipole-dipole attractive and repulsive interactions and VDW interactions between the two strands. The dipole-dipole attractive and repulsive interactions are further obtained in this article. The total of N-H...H-N and C=O...O=C dipole-dipole repulsive interaction (the secondary electrostatic repulsive interaction) in the small ring of the antiparallel beta-sheet models is estimated to be about 6.0 kcal/mol. The individual N-H...O=C dipole-dipole attractive interaction is predicted to be -6.2 +/- 0.2 kcal/mol in the antiparallel beta-sheet models and -5.2 +/- 0.6 kcal/mol in the parallel beta-sheet models. The individual C(alpha)-H...O=C attractive interaction is -1.2 +/- 0.2 kcal/mol in the antiparallel beta-sheet models and -1.5 +/- 0.2 kcal/mol in the parallel beta-sheet models. These values are important in understanding the interactions at protein-protein interfaces and developing a more accurate force field for peptides and proteins. 2009 Wiley Periodicals, Inc.
Connectionist Models and Parallelism in High Level Vision.

DTIC Science & Technology

1985-01-01

GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Transient Finite Element Computations on a Variable Transputer System

NASA Technical Reports Server (NTRS)

Smolinski, Patrick J.; Lapczyk, Ireneusz

1993-01-01

A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Force user's manual, revised

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1987-01-01

A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

PubMed

Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

2013-12-01

Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

NASA Astrophysics Data System (ADS)

Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.

2013-12-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
Fast, Parallel and Secure Cryptography Algorithm Using Lorenz's Attractor

NASA Astrophysics Data System (ADS)

Marco, Anderson Gonçalves; Martinez, Alexandre Souto; Bruno, Odemir Martinez

A novel cryptography method based on the Lorenz's attractor chaotic system is presented. The proposed algorithm is secure and fast, making it practical for general use. We introduce the chaotic operation mode, which provides an interaction among the password, message and a chaotic system. It ensures that the algorithm yields a secure codification, even if the nature of the chaotic system is known. The algorithm has been implemented in two versions: one sequential and slow and the other, parallel and fast. Our algorithm assures the integrity of the ciphertext (we know if it has been altered, which is not assured by traditional algorithms) and consequently its authenticity. Numerical experiments are presented, discussed and show the behavior of the method in terms of security and performance. The fast version of the algorithm has a performance comparable to AES, a popular cryptography program used commercially nowadays, but it is more secure, which makes it immediately suitable for general purpose cryptography applications. An internet page has been set up, which enables the readers to test the algorithm and also to try to break into the cipher.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M.

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
Inquiry in interaction: How local adaptations of curricula shape classroom communities

NASA Astrophysics Data System (ADS)

Enyedy, Noel; Goldberg, Jennifer

2004-11-01

In this study, we seek a better understanding of how individuals and their daily interactions shape and reshape social structures that constitute a classroom community. Moreover, we provide insight into how discourse and classroom interactions shape the nature of a learning community, as well as which aspects of the classroom culture may be consequential for learning. The participants in this study include two teachers who are implementing a new environmental science program, Global Learning through Observation to Benefit the Environment (GLOBE), and interacting with 54 children in an urban middle school. Both qualitative and quantitative data are analyzed and presented. To gain a better understanding of the inquiry teaching within classroom communities, we compare and contrast the discourse and interactions of the two teachers during three parallel environmental science lessons. The focus of our analysis includes (1) how the community identifies the object or goal of its activity; and (2) how the rights, rules, and roles for members are established and inhabited in interaction. Quantitative analyses of student pre- and posttests suggest greater learning for students in one classroom over the other, providing support for the influence of the classroom community and interactional choices of the teacher on student learning. Implications of the findings from this study are discussed in the context of curricular design, professional development, and educational reform. ? 2004 Wiley Periodicals, Inc. J Res Sci Teach 41: 905-935, 2004.
Soap films and GeoGebra in the study of Fermat and Steiner points

NASA Astrophysics Data System (ADS)

Flores, Alfinio; Park, Jungeun

2018-05-01

We discuss how mathematics and secondary mathematics education majors developed an understanding of Fermat points for the triangle as well as Steiner points for the square and regular pentagon, and also of soap film configurations between parallel plates where forces are in equilibrium. The activities included the use of soap films and the interactive geometry program GeoGebra. Students worked in small groups using these tools to investigate the properties of Fermat and Steiner points and then justified the results of their investigations using geometrical arguments. These activities are specific approaches of how to encourage prospective teachers to use physical experiments to support students' development of mathematical curiosity and mathematical justifications.
6th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG) Applications and Beyond, April 25–27, 2013, Riga, Latvia

PubMed Central

Schlaeger, Christof; Hinzmann, Rolf

2013-01-01

Abstract International experts in the fields of diabetes, diabetes technology, endocrinology, and pediatrics gathered for the 6th Annual Symposium on Self-Monitoring of Blood Glucose (SMBG) Applications and beyond. The aim of this meeting was to continue setting up a global network of experts in this field and provide an international platform for exchange of ideas to improve life for people with diabetes. The 2013 meeting comprised a comprehensive scientific program, parallel interactive workshops, and two keynote lectures. All these discussions were intended to help identify gaps and areas where further scientific work and clinical studies are warranted. PMID:24074038
Parallel changes of taxonomic interaction networks in lacustrine bacterial communities induced by a polymetallic perturbation

PubMed Central

Laplante, Karine; Sébastien, Boutin; Derome, Nicolas

2013-01-01

Heavy metals released by anthropogenic activities such as mining trigger profound changes to bacterial communities. In this study we used 16S SSU rRNA gene high-throughput sequencing to characterize the impact of a polymetallic perturbation and other environmental parameters on taxonomic networks within five lacustrine bacterial communities from sites located near Rouyn-Noranda, Quebec, Canada. The results showed that community equilibrium was disturbed in terms of both diversity and structure. Moreover, heavy metals, especially cadmium combined with water acidity, induced parallel changes among sites via the selection of resistant OTUs (Operational Taxonomic Unit) and taxonomic dominance perturbations favoring the Alphaproteobacteria. Furthermore, under a similar selective pressure, covariation trends between phyla revealed conservation and parallelism within interphylum interactions. Our study sheds light on the importance of analyzing communities not only from a phylogenetic perspective but also including a quantitative approach to provide significant insights into the evolutionary forces that shape the dynamic of the taxonomic interaction networks in bacterial communities. PMID:23789031
Parallel Force Assay for Protein-Protein Interactions

PubMed Central

Aschenbrenner, Daniela; Pippig, Diana A.; Klamecka, Kamila; Limmer, Katja; Leonhardt, Heinrich; Gaub, Hermann E.

2014-01-01

Quantitative proteome research is greatly promoted by high-resolution parallel format assays. A characterization of protein complexes based on binding forces offers an unparalleled dynamic range and allows for the effective discrimination of non-specific interactions. Here we present a DNA-based Molecular Force Assay to quantify protein-protein interactions, namely the bond between different variants of GFP and GFP-binding nanobodies. We present different strategies to adjust the maximum sensitivity window of the assay by influencing the binding strength of the DNA reference duplexes. The binding of the nanobody Enhancer to the different GFP constructs is compared at high sensitivity of the assay. Whereas the binding strength to wild type and enhanced GFP are equal within experimental error, stronger binding to superfolder GFP is observed. This difference in binding strength is attributed to alterations in the amino acids that form contacts according to the crystal structure of the initial wild type GFP-Enhancer complex. Moreover, we outline the potential for large-scale parallelization of the assay. PMID:25546146
Parallel force assay for protein-protein interactions.

PubMed

Aschenbrenner, Daniela; Pippig, Diana A; Klamecka, Kamila; Limmer, Katja; Leonhardt, Heinrich; Gaub, Hermann E

2014-01-01

Quantitative proteome research is greatly promoted by high-resolution parallel format assays. A characterization of protein complexes based on binding forces offers an unparalleled dynamic range and allows for the effective discrimination of non-specific interactions. Here we present a DNA-based Molecular Force Assay to quantify protein-protein interactions, namely the bond between different variants of GFP and GFP-binding nanobodies. We present different strategies to adjust the maximum sensitivity window of the assay by influencing the binding strength of the DNA reference duplexes. The binding of the nanobody Enhancer to the different GFP constructs is compared at high sensitivity of the assay. Whereas the binding strength to wild type and enhanced GFP are equal within experimental error, stronger binding to superfolder GFP is observed. This difference in binding strength is attributed to alterations in the amino acids that form contacts according to the crystal structure of the initial wild type GFP-Enhancer complex. Moreover, we outline the potential for large-scale parallelization of the assay.
Northeast Artificial Intelligence Consortium Annual Report - 1988 Parallel Vision. Volume 9

DTIC Science & Technology

1989-10-01

supports the Northeast Aritificial Intelligence Consortium (NAIC). Volume 9 Parallel Vision Report submitted by Christopher M. Brown Randal C. Nelson...NORTHEAST ARTIFICIAL INTELLIGENCE CONSORTIUM ANNUAL REPORT - 1988 Parallel Vision Syracuse University Christopher M. Brown and Randal C. Nelson...Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER: IGOR G. PLONISCH Directorate of Plans & Programs If your address has
High Performance Input/Output for Parallel Computer Systems

NASA Technical Reports Server (NTRS)

Ligon, W. B.

1996-01-01

The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.

The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Parallel and Multivalued Logic by the Two-Dimensional Photon-Echo Response of a Rhodamine–DNA Complex

PubMed Central

2015-01-01

Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE PAGES

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

2017-11-14

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

NASA Astrophysics Data System (ADS)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

2017-11-01

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dritz, K.W.; Boyle, J.M.

This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less
Stress and decision making: neural correlates of the interaction between stress, executive functions, and decision making under risk.

PubMed

Gathmann, Bettina; Schulte, Frank P; Maderwald, Stefan; Pawlikowski, Mirko; Starcke, Katrin; Schäfer, Lena C; Schöler, Tobias; Wolf, Oliver T; Brand, Matthias

2014-03-01

Stress and additional load on the executive system, produced by a parallel working memory task, impair decision making under risk. However, the combination of stress and a parallel task seems to preserve the decision-making performance [e.g., operationalized by the Game of Dice Task (GDT)] from decreasing, probably by a switch from serial to parallel processing. The question remains how the brain manages such demanding decision-making situations. The current study used a 7-tesla magnetic resonance imaging (MRI) system in order to investigate the underlying neural correlates of the interaction between stress (induced by the Trier Social Stress Test), risky decision making (GDT), and a parallel executive task (2-back task) to get a better understanding of those behavioral findings. The results show that on a behavioral level, stressed participants did not show significant differences in task performance. Interestingly, when comparing the stress group (SG) with the control group, the SG showed a greater increase in neural activation in the anterior prefrontal cortex when performing the 2-back task simultaneously with the GDT than when performing each task alone. This brain area is associated with parallel processing. Thus, the results may suggest that in stressful dual-tasking situations, where a decision has to be made when in parallel working memory is demanded, a stronger activation of a brain area associated with parallel processing takes place. The findings are in line with the idea that stress seems to trigger a switch from serial to parallel processing in demanding dual-tasking situations.
NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes

NASA Astrophysics Data System (ADS)

Caplan, R. M.

2013-04-01

We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.
ICASE Computer Science Program

NASA Technical Reports Server (NTRS)

1985-01-01

The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.
The neural basis of parallel saccade programming: an fMRI study.

PubMed

Hu, Yanbo; Walker, Robin

2011-11-01

The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

NASA Technical Reports Server (NTRS)

Woo, Alex C.; Hill, Kueichien C.

1996-01-01

The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Optimized and parallelized implementation of the electronegativity equalization method and the atom-bond electronegativity equalization method.

PubMed

Vareková, R Svobodová; Koca, J

2006-02-01

The most common way to calculate charge distribution in a molecule is ab initio quantum mechanics (QM). Some faster alternatives to QM have also been developed, the so-called "equalization methods" EEM and ABEEM, which are based on DFT. We have implemented and optimized the EEM and ABEEM methods and created the EEM SOLVER and ABEEM SOLVER programs. It has been found that the most time-consuming part of equalization methods is the reduction of the matrix belonging to the equation system generated by the method. Therefore, for both methods this part was replaced by the parallel algorithm WIRS and implemented within the PVM environment. The parallelized versions of the programs EEM SOLVER and ABEEM SOLVER showed promising results, especially on a single computer with several processors (compact PVM). The implemented programs are available through the Web page http://ncbr.chemi.muni.cz/~n19n/eem_abeem.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Developing Information Power Grid Based Algorithms and Software

NASA Technical Reports Server (NTRS)

Dongarra, Jack

1998-01-01

This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.

Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Memory-based frame synchronizer. [for digital communication systems

NASA Technical Reports Server (NTRS)

Stattel, R. J.; Niswander, J. K. (Inventor)

1981-01-01

A frame synchronizer for use in digital communications systems wherein data formats can be easily and dynamically changed is described. The use of memory array elements provide increased flexibility in format selection and sync word selection in addition to real time reconfiguration ability. The frame synchronizer comprises a serial-to-parallel converter which converts a serial input data stream to a constantly changing parallel data output. This parallel data output is supplied to programmable sync word recognizers each consisting of a multiplexer and a random access memory (RAM). The multiplexer is connected to both the parallel data output and an address bus which may be connected to a microprocessor or computer for purposes of programming the sync word recognizer. The RAM is used as an associative memory or decorder and is programmed to identify a specific sync word. Additional programmable RAMs are used as counter decoders to define word bit length, frame word length, and paragraph frame length.
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Method, systems, and computer program products for implementing function-parallel network firewall

DOEpatents

Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

2011-10-11

Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.
A computer program for converting rectangular coordinates to latitude-longitude coordinates

USGS Publications Warehouse

Rutledge, A.T.

1989-01-01

A computer program was developed for converting the coordinates of any rectangular grid on a map to coordinates on a grid that is parallel to lines of equal latitude and longitude. Using this program in conjunction with groundwater flow models, the user can extract data and results from models with varying grid orientations and place these data into grid structure that is oriented parallel to lines of equal latitude and longitude. All cells in the rectangular grid must have equal dimensions, and all cells in the latitude-longitude grid measure one minute by one minute. This program is applicable if the map used shows lines of equal latitude as arcs and lines of equal longitude as straight lines and assumes that the Earth 's surface can be approximated as a sphere. The program user enters the row number , column number, and latitude and longitude of the midpoint of the cell for three test cells on the rectangular grid. The latitude and longitude of boundaries of the rectangular grid also are entered. By solving sets of simultaneous linear equations, the program calculates coefficients that are used for making the conversion. As an option in the program, the user may build a groundwater model file based on a grid that is parallel to lines of equal latitude and longitude. The program reads a data file based on the rectangular coordinates and automatically forms the new data file. (USGS)
Oscillations and chaos in neural networks: an exactly solvable model.

PubMed Central

Wang, L P; Pichler, E E; Ross, J

1990-01-01

We consider a randomly diluted higher-order network with noise, consisting of McCulloch-Pitts neurons that interact by Hebbian-type connections. For this model, exact dynamical equations are derived and solved for both parallel and random sequential updating algorithms. For parallel dynamics, we find a rich spectrum of different behaviors including static retrieving and oscillatory and chaotic phenomena in different parts of the parameter space. The bifurcation parameters include first- and second-order neuronal interaction coefficients and a rescaled noise level, which represents the combined effects of the random synaptic dilution, interference between stored patterns, and additional background noise. We show that a marked difference in terms of the occurrence of oscillations or chaos exists between neural networks with parallel and random sequential dynamics. Images PMID:2251287
IDEA: Interactive Display for Evolutionary Analyses.

PubMed

Egan, Amy; Mahurkar, Anup; Crabtree, Jonathan; Badger, Jonathan H; Carlton, Jane M; Silva, Joana C

2008-12-08

The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency. We have developed IDEA (Interactive Display for Evolutionary Analyses), an intuitive graphical input and output interface which interacts with PHYLIP for phylogeny reconstruction and with codeml and baseml for molecular evolution analyses. IDEA's graphical input and visualization interfaces eliminate the need to edit and parse text input and output files, reducing the likelihood of errors and improving processing time. Further, its interactive output display gives the user immediate access to results. Finally, IDEA can process data in parallel on a local machine or computing grid, allowing genome-wide analyses to be completed quickly. IDEA provides a graphical user interface that allows the user to follow a codeml or baseml analysis from parameter input through to the exploration of results. Novel options streamline the analysis process, and post-analysis visualization of phylogenies, evolutionary rates and selective constraint along protein sequences simplifies the interpretation of results. The integration of these functions into a single tool eliminates the need for lengthy data handling and parsing, significantly expediting access to global patterns in the data.
IDEA: Interactive Display for Evolutionary Analyses

PubMed Central

Egan, Amy; Mahurkar, Anup; Crabtree, Jonathan; Badger, Jonathan H; Carlton, Jane M; Silva, Joana C

2008-01-01

Background The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency. Results We have developed IDEA (Interactive Display for Evolutionary Analyses), an intuitive graphical input and output interface which interacts with PHYLIP for phylogeny reconstruction and with codeml and baseml for molecular evolution analyses. IDEA's graphical input and visualization interfaces eliminate the need to edit and parse text input and output files, reducing the likelihood of errors and improving processing time. Further, its interactive output display gives the user immediate access to results. Finally, IDEA can process data in parallel on a local machine or computing grid, allowing genome-wide analyses to be completed quickly. Conclusion IDEA provides a graphical user interface that allows the user to follow a codeml or baseml analysis from parameter input through to the exploration of results. Novel options streamline the analysis process, and post-analysis visualization of phylogenies, evolutionary rates and selective constraint along protein sequences simplifies the interpretation of results. The integration of these functions into a single tool eliminates the need for lengthy data handling and parsing, significantly expediting access to global patterns in the data. PMID:19061522
A distributed version of the NASA Engine Performance Program

NASA Technical Reports Server (NTRS)

Cours, Jeffrey T.; Curlett, Brian P.

1993-01-01

Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Molecular pathways to parallel evolution: I. Gene nexuses and their morphological correlates.

PubMed

Zuckerkandl, E

1994-12-01

Aspects of the regulatory interactions among genes are probably as old as most genes are themselves. Correspondingly, similar predispositions to changes in such interactions must have existed for long evolutionary periods. Features of the structure and the evolution of the system of gene regulation furnish the background necessary for a molecular understanding of parallel evolution. Patently "unrelated" organs, such as the fat body of a fly and the liver of a mammal, can exhibit fractional homology, a fraction expected to become subject to quantitation. This also seems to hold for different organs in the same organism, such as wings and legs of a fly. In informational macromolecules, on the other hand, homology is indeed all or none. In the quite different case of organs, analogy is expected usually to represent attenuated homology. Many instances of putative convergence are likely to turn out to be predominantly parallel evolution, presumably including the case of the vertebrate and cephalopod eyes. Homology in morphological features reflects a similarity in networks of active genes. Similar nexuses of active genes can be established in cells of different embryological origins. Thus, parallel development can be considered a counterpart to parallel evolution. Specific macromolecular interactions leading to the regulation of the c-fos gene are given as an example of a "controller node" defined as a regulatory unit. Quantitative changes in gene control are distinguished from relational changes, and frequent parallelism in quantitative changes is noted in Drosophila enzymes. Evolutionary reversions in quantitative gene expression are also expected. The evolution of relational patterns is attributed to several distinct mechanisms, notably the shuffling of protein domains. The growth of such patterns may in part be brought about by a particular process of compensation for "controller gene diseases," a process that would spontaneously tend to lead to increased regulatory and organismal complexity. Despite the inferred increase in gene interaction complexity, whose course over evolutionary time is unknown, the number of homology groups for the functional and structural protein units designated as domains has probably remained rather constant, even as, in some of its branches, evolution moved toward "higher" organisms. In connection with this process, the question is raised of parallel evolution within the purview of activating and repressing master switches and in regard to the number of levels into which the hierarchies of genic master switches will eventually be resolved.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Tile-based Level of Detail for the Parallel Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niski, K; Cohen, J D

Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; et al.

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
USRA/RIACS

NASA Technical Reports Server (NTRS)

Oliger, Joseph

1992-01-01

The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on 6 June 1983. RIACS is privately operated by USRA, a consortium of universities with research programs in the aerospace sciences, under a cooperative agreement with NASA. The primary mission of RIACS is to provide research and expertise in computer science and scientific computing to support the scientific missions of NASA ARC. The research carried out at RIACS must change its emphasis from year to year in response to NASA ARC's changing needs and technological opportunities. A flexible scientific staff is provided through a university faculty visitor program, a post doctoral program, and a student visitor program. Not only does this provide appropriate expertise but it also introduces scientists outside of NASA to NASA problems. A small group of core RIACS staff provides continuity and interacts with an ARC technical monitor and scientific advisory group to determine the RIACS mission. RIACS activities are reviewed and monitored by a USRA advisory council and ARC technical monitor. Research at RIACS is currently being done in the following areas: Parallel Computing; Advanced Methods for Scientific Computing; Learning Systems; High Performance Networks and Technology; Graphics, Visualization, and Virtual Environments.

Effects of intergenerational Montessori-based activities programming on engagement of nursing home residents with dementia

PubMed Central

Lee, Michelle M; Camp, Cameron J; Malone, Megan L

2007-01-01

Fourteen nursing home residents on a dementia special care unit at a skilled nursing facility took part in one-to-one intergenerational programming (IGP) with 15 preschool children from the facility’s on-site child care center. Montessori-based activities served as the interface for interactions between dyads. The amount of time residents demonstrated positive and negative forms of engagement during IGP and standard activities programming was assessed through direct observation using a tool developed for this purpose – the Myers Research Institute Engagement Scale (MRI-ES). These residents with dementia displayed the ability to successfully take part in IGP. Most successfully presented “lessons” to the children in their dyads, similar to the way that Montessori teachers present lessons to children, while persons with more severe cognitive impairment took part in IGP through other methods such as parallel play. Taking part in IGP was consistently related with higher levels of positive engagement and lower levels of negative forms of engagement in these residents with dementia than levels seen in standard activities programming on the unit. Implications of using this form of IGP, and directions for future research, are discussed. PMID:18044197
Effects of intergenerational Montessori-based activities programming on engagement of nursing home residents with dementia.

PubMed

Lee, Michelle M; Camp, Cameron J; Malone, Megan L

2007-01-01

Fourteen nursing home residents on a dementia special care unit at a skilled nursing facility took part in one-to-one intergenerational programming (IGP) with 15 preschool children from the facility's on-site child care center. Montessori-based activities served as the interface for interactions between dyads. The amount of time residents demonstrated positive and negative forms of engagement during IGP and standard activities programming was assessed through direct observation using a tool developed for this purpose--the Myers Research Institute Engagement Scale (MRI-ES). These residents with dementia displayed the ability to successfully take part in IGP. Most successfully presented "lessons" to the children in their dyads, similar to the way that Montessori teachers present lessons to children, while persons with more severe cognitive impairment took part in IGP through other methods such as parallel play. Taking part in IGP was consistently related with higher levels of positive engagement and lower levels of negative forms of engagement in these residents with dementia than levels seen in standard activities programming on the unit. Implications of using this form of IGP, and directions for future research, are discussed.
Eta Carinae - A Demanding Mistress

NASA Technical Reports Server (NTRS)

Gull, Theodore R.

2011-01-01

Over the past 15 years, a number of observers and modelers have increasingly focused on this massive system that is approaching its end stage, a supernova? a hypernova? When? The discovery by Augusto Damineli that Eta Carinae had a 5.5-year period proved timely as the newly-installed STIS was primed to observe its properties in the visible and ultraviolet. Initial observations occurred on January 1998, and through multiple programs, including the multi-cycle Hubble Treasury program, have sampled changes across two cycles. Now a multi-cycle program, focused on mapping variations in the extended wind-wind collision zones through early 2015, will test 3-D models of the interacting winds. In parallel, studies have been accomplished in X-rays with RXTE and CHANDRA, now in the far infrared with Herschel and from the ground with VLT. Each new observation is helping to peel back the veil of mystery on this massive binary system, but also opening up more questions to be answered. Timely inclusion of laboratory studies and models have greatly enhanced the observational results. We will summarize the latest results including submitted papers and very recent results with Herschel.
The Smithsonian-led Marine Global Earth Observatory (MarineGEO): Proposed Model for a Collaborative Network Linking Marine Biodiversity to Ecosystem Processes

NASA Astrophysics Data System (ADS)

Duffy, J. E.

2016-02-01

Biodiversity - the variety of functional types of organisms - is the engine of marine ecosystem processes, including productivity, nutrient cycling, and carbon sequestration. Biodiversity remains a black box in much of ocean science, despite wide recognition that effectively managing human interactions with marine ecosystems requires understanding both structure and functional consequences of biodiversity. Moreover, the inherent complexity of biological systems puts a premium on data-rich, comparative approaches, which are best met via collaborative networks. The Smithsonian Institution's MarineGEO program links a growing network of partners conducting parallel, comparative research to understand change in marine biodiversity and ecosystems, natural and anthropogenic drivers of that change, and the ecological processes mediating it. The focus is on nearshore, seabed-associated systems where biodiversity and human population are concentrated and interact most, yet which fall through the cracks of existing ocean observing programs. MarineGEO offers a standardized toolbox of research modules that efficiently capture key elements of biological diversity and its importance in ecological processes across a range of habitats. The toolbox integrates high-tech (DNA-based, imaging) and low-tech protocols (diver surveys, rapid assays of consumer activity) adaptable to differing institutional capacity and resources. The model for long-term sustainability involves leveraging in-kind support among partners, adoption of best practices wherever possible, engagement of students and citizen scientists, and benefits of training, networking, and global relevance as incentives for participation. Here I highlight several MarineGEO comparative research projects demonstrating the value of standardized, scalable assays and parallel experiments for measuring fish and invertebrate diversity, recruitment, benthic herbivory and generalist predation, decomposition, and carbon sequestration. Key remaining challenges include consensus on protocols; integration of historical data; data management and access; and informatics. These challenges are common to other fields and prospects for progress in the near future are good.
Modulation of cardiac myocyte phenotype in vitro by the composition and orientation of the extracellular matrix.

PubMed

Simpson, D G; Terracio, L; Terracio, M; Price, R L; Turner, D C; Borg, T K

1994-10-01

Cellular phenotype is the result of a dynamic interaction between a cell's intrinsic genetic program and the morphogenetic signals that serve to modulate the extent to which that program is expressed. In the present study we have examined how morphogenetic information might be stored in the extracellular matrix (ECM) and communicated to the neonatal heart cell (NHC) by the cardiac alpha 1 beta 1 integrin molecule. A thin film of type I collagen (T1C) was prepared with a defined orientation. This was achieved by applying T1C to the peripheral edge of a 100 mm culture dish. The T1C was then drawn across the surface of the dish in a continuous stroke with a sterile cell scraper and allowed to polymerize. When NHCs were cultured on this substrate, they spread, as a population, along a common axis in parallel with the gel lattice and expressed an in vivo-like phenotype. Individual NHCs displayed an elongated, rod-like shape and disclosed parallel arrays of myofibrils. These phenotypic characteristics were maintained for at least 4 weeks in primary culture. The evolution of this tissue-like organizational pattern was dependent upon specific interactions between the NHCs and the collagen-based matrix that were mediated by the cardiac alpha 1 beta 1 integrin complex. This conclusion was supported by a variety of experimental results. Altering the tertiary structure of the matrix or blocking the extracellular domains of either the cardiac alpha 1 or beta 1 integrin chain inhibited the expression of the tissue-like pattern of organization. Neither cell-to-cell contact or contractile function were necessary to induce the formation of the rod-like cell shape. However, beating activity was necessary for the assembly of a well-differentiated myofibrillar apparatus. These data suggest that the cardiac alpha 1 beta 1 integrin complex serves to detect and transduce phenotypic information stored within the tertiary structure of the surrounding matrix.
Analyzing and Visualizing Cosmological Simulations with ParaView

NASA Astrophysics Data System (ADS)

Woodring, Jonathan; Heitmann, Katrin; Ahrens, James; Fasel, Patricia; Hsu, Chung-Hsing; Habib, Salman; Pope, Adrian

2011-07-01

The advent of large cosmological sky surveys—ushering in the era of precision cosmology—has been accompanied by ever larger cosmological simulations. The analysis of these simulations, which currently encompass tens of billions of particles and up to a trillion particles in the near future, is often as daunting as carrying out the simulations in the first place. Therefore, the development of very efficient analysis tools combining qualitative and quantitative capabilities is a matter of some urgency. In this paper, we introduce new analysis features implemented within ParaView, a fully parallel, open-source visualization toolkit, to analyze large N-body simulations. A major aspect of ParaView is that it can live and operate on the same machines and utilize the same parallel power as the simulation codes themselves. In addition, data movement is in a serious bottleneck now and will become even more of an issue in the future; an interactive visualization and analysis tool that can handle data in situ is fast becoming essential. The new features in ParaView include particle readers and a very efficient halo finder that identifies friends-of-friends halos and determines common halo properties, including spherical overdensity properties. In combination with many other functionalities already existing within ParaView, such as histogram routines or interfaces to programming languages like Python, this enhanced version enables fast, interactive, and convenient analyses of large cosmological simulations. In addition, development paths are available for future extensions.
Comparing the Teaching Interaction Procedure to Social Stories for People with Autism

ERIC Educational Resources Information Center

Leaf, Justin B.; Oppenheim-Leaf, Misty L.; Call, Nikki A.; Sheldon, Jan B.; Sherman, James A.; Taubman, Mitchell; McEachin, John; Dayharsh, Jamison; Leaf, Ronald

2012-01-01

This study compared social stories and the teaching interaction procedure to teach social skills to 6 children and adolescents with an autism spectrum disorder. Researchers taught 18 social skills with social stories and 18 social skills with the teaching interaction procedure within a parallel treatment design. The teaching interaction procedure…
Cache Locality Optimization for Recursive Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less
A Comparison of Three Programming Models for Adaptive Applications

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

2000-01-01

We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Parallel-Processing Software for Correlating Stereo Images

NASA Technical Reports Server (NTRS)

Klimeck, Gerhard; Deen, Robert; Mcauley, Michael; DeJong, Eric

2007-01-01

A computer program implements parallel- processing algorithms for cor relating images of terrain acquired by stereoscopic pairs of digital stereo cameras on an exploratory robotic vehicle (e.g., a Mars rove r). Such correlations are used to create three-dimensional computatio nal models of the terrain for navigation. In this program, the scene viewed by the cameras is segmented into subimages. Each subimage is assigned to one of a number of central processing units (CPUs) opera ting simultaneously.
Optimization by nonhierarchical asynchronous decomposition

NASA Technical Reports Server (NTRS)

Shankar, Jayashree; Ribbens, Calvin J.; Haftka, Raphael T.; Watson, Layne T.

1992-01-01

Large scale optimization problems are tractable only if they are somehow decomposed. Hierarchical decompositions are inappropriate for some types of problems and do not parallelize well. Sobieszczanski-Sobieski has proposed a nonhierarchical decomposition strategy for nonlinear constrained optimization that is naturally parallel. Despite some successes on engineering problems, the algorithm as originally proposed fails on simple two dimensional quadratic programs. The algorithm is carefully analyzed for quadratic programs, and a number of modifications are suggested to improve its robustness.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kargupta, H.; Stafford, B.; Hamzaoglu, I.

This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.
National Centers for Environmental Prediction

Science.gov Websites

Reference List Table of Contents NCEP OPERATIONAL MODEL FORECAST GRAPHICS PARALLEL/EXPERIMENTAL MODEL Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS VERIFICATION (GRID VS.OBS) WEB PAGE (NCEP EXPERIMENTAL PAGE, INTERNAL USE ONLY) Interactive web page tool for
The interaction of turbulence with parallel and perpendicular shocks

NASA Astrophysics Data System (ADS)

Adhikari, L.; Zank, G. P.; Hunana, P.; Hu, Q.

2016-11-01

Interplanetary shocks exist in most astrophysical flows, and modify the properties of the background flow. We apply the Zank et al 2012 six coupled turbulence transport model equations to study the interaction of turbulence with parallel and perpendicular shock waves in the solar wind. We model the 1D structure of a stationary perpendicular or parallel shock wave using a hyperbolic tangent function and the Rankine-Hugoniot conditions. A reduced turbulence transport model (the 4-equation model) is applied to parallel and perpendicular shock waves, and solved using a 4th- order Runge Kutta method. We compare the model results with ACE spacecraft observations. We identify one quasi-parallel and one quasi-perpendicular event in the ACE spacecraft data sets, and compute various turbulent observed values such as the fluctuating magnetic and kinetic energy, the energy in forward and backward propagating modes, the total turbulent energy in the upstream and downstream of the shock. We also calculate the error associated with each turbulent observed value, and fit the observed values by a least square method and use a Fourier series fitting function. We find that the theoretical results are in reasonable agreement with observations. The energy in turbulent fluctuations is enhanced and the correlation length is approximately constant at the shock. Similarly, the normalized cross helicity increases across a perpendicular shock, and decreases across a parallel shock.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Dean N.

2011-07-20

This report summarizes work carried out by the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Team for the period of January 1, 2011 through June 30, 2011. It discusses highlights, overall progress, period goals, and collaborations and lists papers and presentations. To learn more about our project, please visit our UV-CDAT website (URL: http://uv-cdat.org). This report will be forwarded to the program manager for the Department of Energy (DOE) Office of Biological and Environmental Research (BER), national and international collaborators and stakeholders, and to researchers working on a wide range of other climate model, reanalysis, and observation evaluation activities. Themore » UV-CDAT executive committee consists of Dean N. Williams of Lawrence Livermore National Laboratory (LLNL); Dave Bader and Galen Shipman of Oak Ridge National Laboratory (ORNL); Phil Jones and James Ahrens of Los Alamos National Laboratory (LANL), Claudio Silva of Polytechnic Institute of New York University (NYU-Poly); and Berk Geveci of Kitware, Inc. The UV-CDAT team consists of researchers and scientists with diverse domain knowledge whose home institutions also include the National Aeronautics and Space Administration (NASA) and the University of Utah. All work is accomplished under DOE open-source guidelines and in close collaboration with the project's stakeholders, domain researchers, and scientists. Working directly with BER climate science analysis projects, this consortium will develop and deploy data and computational resources useful to a wide variety of stakeholders, including scientists, policymakers, and the general public. Members of this consortium already collaborate with other institutions and universities in researching data discovery, management, visualization, workflow analysis, and provenance. The UV-CDAT team will address the following high-level visualization requirements: (1) Alternative parallel streaming statistics and analysis pipelines - Data parallelism, Task parallelism, Visualization parallelism; (2) Optimized parallel input/output (I/O); (3) Remote interactive execution; (4) Advanced intercomparison visualization; (5) Data provenance processing and capture; and (6) Interfaces for scientists - Workflow data analysis and visualization construction tools, and Visualization interfaces.« less
Final Report: Correctness Tools for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mellor-Crummey, John

2014-10-27

In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
ng: What next-generation languages can teach us about HENP frameworks in the manycore era

NASA Astrophysics Data System (ADS)

Binet, Sébastien

2011-12-01

Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
A Programming Framework for Scientific Applications on CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less

Transputer parallel processing at NASA Lewis Research Center

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1989-01-01

The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
Multilevel summation method for electrostatic force evaluation.

PubMed

Hardy, David J; Wu, Zhe; Phillips, James C; Stone, John E; Skeel, Robert D; Schulten, Klaus

2015-02-10

The multilevel summation method (MSM) offers an efficient algorithm utilizing convolution for evaluating long-range forces arising in molecular dynamics simulations. Shifting the balance of computation and communication, MSM provides key advantages over the ubiquitous particle–mesh Ewald (PME) method, offering better scaling on parallel computers and permitting more modeling flexibility, with support for periodic systems as does PME but also for semiperiodic and nonperiodic systems. The version of MSM available in the simulation program NAMD is described, and its performance and accuracy are compared with the PME method. The accuracy feasible for MSM in practical applications reproduces PME results for water property calculations of density, diffusion constant, dielectric constant, surface tension, radial distribution function, and distance-dependent Kirkwood factor, even though the numerical accuracy of PME is higher than that of MSM. Excellent agreement between MSM and PME is found also for interface potentials of air–water and membrane–water interfaces, where long-range Coulombic interactions are crucial. Applications demonstrate also the suitability of MSM for systems with semiperiodic and nonperiodic boundaries. For this purpose, simulations have been performed with periodic boundaries along directions parallel to a membrane surface but not along the surface normal, yielding membrane pore formation induced by an imbalance of charge across the membrane. Using a similar semiperiodic boundary condition, ion conduction through a graphene nanopore driven by an ion gradient has been simulated. Furthermore, proteins have been simulated inside a single spherical water droplet. Finally, parallel scalability results show the ability of MSM to outperform PME when scaling a system of modest size (less than 100 K atoms) to over a thousand processors, demonstrating the suitability of MSM for large-scale parallel simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Analysis of the study skills of undergraduate pharmacy students of the University of Zambia School of Medicine.

PubMed

Ezeala, Christian Chinyere; Siyanga, Nalucha

2015-01-01

It aimed to compare the study skills of two groups of undergraduate pharmacy students in the School of Medicine, University of Zambia using the Study Skills Assessment Questionnaire (SSAQ), with the goal of analysing students' study skills and identifying factors that affect study skills. A questionnaire was distributed to 67 participants from both programs using stratified random sampling. Completed questionnaires were rated according to participants study skill. The total scores and scores within subscales were analysed and compared quantitatively. Questionnaires were distributed to 37 students in the regular program, and to 30 students in the parallel program. The response rate was 100%. Students had moderate to good study skills: 22 respondents (32.8%) showed good study skills, while 45 respondents (67.2%) were found to have moderate study skills. Students in the parallel program demonstrated significantly better study skills (mean SSAQ score, 185.4±14.5), particularly in time management and writing, than the students in the regular program (mean SSAQ score 175±25.4; P<0.05). No significant differences were found according to age, gender, residential or marital status, or level of study. The students in the parallel program had better time management and writing skills, probably due to their prior work experience. The more intensive training to students in regular program is needed in improving time management and writing skills.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
High-energy physics software parallelization using database techniques

NASA Astrophysics Data System (ADS)

Argante, E.; van der Stok, P. D. V.; Willers, I.

1997-02-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.
Michael Sprague | NREL

Science.gov Websites

student, he developed a parallel spectral finite element method for treating the interaction of large mechanics of fluids, structures, and their interaction|Spectral finite-element methods for time-dependent
Soliton interactions and complexes for coupled nonlinear Schrödinger equations.

PubMed

Jiang, Yan; Tian, Bo; Liu, Wen-Jun; Sun, Kun; Li, Min; Wang, Pan

2012-03-01

Under investigation in this paper are the coupled nonlinear Schrödinger (CNLS) equations, which can be used to govern the optical-soliton propagation and interaction in such optical media as the multimode fibers, fiber arrays, and birefringent fibers. By taking the 3-CNLS equations as an example for the N-CNLS ones (N≥3), we derive the analytic mixed-type two- and three-soliton solutions in more general forms than those obtained in the previous studies with the Hirota method and symbolic computation. With the choice of parameters for those soliton solutions, soliton interactions and complexes are investigated through the asymptotic and graphic analysis. Soliton interactions and complexes with the bound dark solitons in a mode or two modes are observed, including that (i) the two bright solitons display the breatherlike structures while the two dark ones stay parallel, (ii) the two bright and dark solitons all stay parallel, and (iii) the states of the bound solitons change from the breatherlike structures to the parallel one even with the distance between those solitons smaller than that before the interaction with the regular one soliton. Asymptotic analysis is also used to investigate the elastic and inelastic interactions between the bound solitons and the regular one soliton. Furthermore, some discussions are extended to the N-CNLS equations (N>3). Our results might be helpful in such applications as the soliton switch, optical computing, and soliton amplification in the nonlinear optics.
Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

NASA Technical Reports Server (NTRS)

West, Jeff; Yang, H. Q.

2014-01-01

There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.
mm_par2.0: An object-oriented molecular dynamics simulation program parallelized using a hierarchical scheme with MPI and OPENMP

NASA Astrophysics Data System (ADS)

Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo

2012-02-01

We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.
Analysis and selection of optimal function implementations in massively parallel computer

DOEpatents

Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2011-05-31

An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Automating FEA programming

NASA Technical Reports Server (NTRS)

Sharma, Naveen

1992-01-01

In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
Development for SSV on a parallel processing system (PARAGON)

NASA Astrophysics Data System (ADS)

Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

1995-12-01

A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
A scheme for solving the plane-plane challenge in force measurements at the nanoscale.

PubMed

Siria, Alessandro; Huant, Serge; Auvert, Geoffroy; Comin, Fabio; Chevrier, Joel

2010-05-19

Non-contact interaction between two parallel flat surfaces is a central paradigm in sciences. This situation is the starting point for a wealth of different models: the capacitor description in electrostatics, hydrodynamic flow, thermal exchange, the Casimir force, direct contact study, third body confinement such as liquids or films of soft condensed matter. The control of parallelism is so demanding that no versatile single force machine in this geometry has been proposed so far. Using a combination of nanopositioning based on inertial motors, of microcrystal shaping with a focused-ion beam (FIB) and of accurate in situ and real-time control of surface parallelism with X-ray diffraction, we propose here a "gedanken" surface-force machine that should enable one to measure interactions between movable surfaces separated by gaps in the micrometer and nanometer ranges.
GASPRNG: GPU accelerated scalable parallel random number generator library

NASA Astrophysics Data System (ADS)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

ERIC Educational Resources Information Center

Valero-Mora, Pedro M.; Ledesma, Ruben D.

2011-01-01

This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…
Investigating Learning with an Interactive Tutorial: A Mixed-Methods Strategy

ERIC Educational Resources Information Center

de Villiers, M. R.; Becker, Daphne

2017-01-01

From the perspective of parallel mixed-methods research, this paper describes interactivity research that employed usability-testing technology to analyse cognitive learning processes; personal learning styles and times; and errors-and-recovery of learners using an interactive e-learning tutorial called "Relations." "Relations"…
Fenix, A Fault Tolerant Programming Framework for MPI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

2016-10-05

Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.